High CPU utilization but low load average The 2019 Stack Overflow Developer Survey Results Are InHow to understand the memory usage and load average in linux serverWhat's going on with my server? High load, lots of idle CPU time, low disk utilizationHigh load average, low cpuLoad average is 50 while CPU Utilization is %60Uneven CPU core utilizationHigh Load Average with modest CPU Utilization and almost no IOHigh load with low CPU usage and low IO usage on Solaris with ZFS and MySQLLow load average, but high %user and %system cpu usageLinux: Extreme load on idle CPUDiscrepancy between “ps aux” and the 1-minute average server loadAll CPU busy, high avg load, but tasks CPU usage don't add up

Likelihood that a superbug or lethal virus could come from a landfill

Why can't wing-mounted spoilers be used to steepen approaches?

Does adding complexity mean a more secure cipher?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Deal with toxic manager when you can't quit

The phrase "to the numbers born"?

A female thief is not sold to make restitution -- so what happens instead?

Why are there uneven bright areas in this photo of black hole?

What do I do when my TA workload is more than expected?

Is an up-to-date browser secure on an out-of-date OS?

What information about me do stores get via my credit card?

Question on an engine pulling a train

Am I ethically obligated to go into work on an off day if the reason is sudden?

Why can I use a list index as an indexing variable in a for loop?

C++ auto on int16_t casts to integer

Christmas short horror story about a woman who becomes trapped in another body?

Using `min_active_rowversion` for global temporary tables

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

What is this sharp, curved notch on my knife for?

Why is this recursive code so slow?

Why didn't the Event Horizon Telescope team mention Sagittarius A*?

Is there a way to generate a uniformly distributed point on a sphere from a fixed amount of random real numbers?

How much of the clove should I use when using big garlic heads?

What was the last CPU that did not have the x87 floating-point unit built in?



High CPU utilization but low load average



The 2019 Stack Overflow Developer Survey Results Are InHow to understand the memory usage and load average in linux serverWhat's going on with my server? High load, lots of idle CPU time, low disk utilizationHigh load average, low cpuLoad average is 50 while CPU Utilization is %60Uneven CPU core utilizationHigh Load Average with modest CPU Utilization and almost no IOHigh load with low CPU usage and low IO usage on Solaris with ZFS and MySQLLow load average, but high %user and %system cpu usageLinux: Extreme load on idle CPUDiscrepancy between “ps aux” and the 1-minute average server loadAll CPU busy, high avg load, but tasks CPU usage don't add up



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








25















We are running into a strange behavior where we see high CPU utilization but quite low load average.



The behavior is best illustrated by the following graphs from our monitoring system.



CPU usage and load



At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.



We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.



The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.



The load average figure is taken from /proc/loadavg each minute.



uname -a gives:



Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux


Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)



We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.



If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?



Are we interpreting our data correctly? What can cause this behavior?










share|improve this question
























  • Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

    – Brian
    Feb 12 '15 at 12:08












  • Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

    – K Erlandsson
    Feb 12 '15 at 12:14











  • I was suggesting your chart may already be showing that.

    – Brian
    Feb 12 '15 at 12:51












  • Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

    – K Erlandsson
    Feb 12 '15 at 12:54

















25















We are running into a strange behavior where we see high CPU utilization but quite low load average.



The behavior is best illustrated by the following graphs from our monitoring system.



CPU usage and load



At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.



We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.



The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.



The load average figure is taken from /proc/loadavg each minute.



uname -a gives:



Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux


Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)



We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.



If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?



Are we interpreting our data correctly? What can cause this behavior?










share|improve this question
























  • Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

    – Brian
    Feb 12 '15 at 12:08












  • Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

    – K Erlandsson
    Feb 12 '15 at 12:14











  • I was suggesting your chart may already be showing that.

    – Brian
    Feb 12 '15 at 12:51












  • Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

    – K Erlandsson
    Feb 12 '15 at 12:54













25












25








25


11






We are running into a strange behavior where we see high CPU utilization but quite low load average.



The behavior is best illustrated by the following graphs from our monitoring system.



CPU usage and load



At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.



We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.



The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.



The load average figure is taken from /proc/loadavg each minute.



uname -a gives:



Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux


Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)



We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.



If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?



Are we interpreting our data correctly? What can cause this behavior?










share|improve this question
















We are running into a strange behavior where we see high CPU utilization but quite low load average.



The behavior is best illustrated by the following graphs from our monitoring system.



CPU usage and load



At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.



We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.



The CPU utilization data is collected by running /usr/bin/mpstat 60 1 each minute. The data for the all row and the %usr column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top.



The load average figure is taken from /proc/loadavg each minute.



uname -a gives:



Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux


Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)



We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.



If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?



Are we interpreting our data correctly? What can cause this behavior?







linux cpu-usage troubleshooting load-average






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 12 '15 at 14:46







K Erlandsson

















asked Feb 12 '15 at 11:53









K ErlandssonK Erlandsson

2751612




2751612












  • Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

    – Brian
    Feb 12 '15 at 12:08












  • Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

    – K Erlandsson
    Feb 12 '15 at 12:14











  • I was suggesting your chart may already be showing that.

    – Brian
    Feb 12 '15 at 12:51












  • Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

    – K Erlandsson
    Feb 12 '15 at 12:54

















  • Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

    – Brian
    Feb 12 '15 at 12:08












  • Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

    – K Erlandsson
    Feb 12 '15 at 12:14











  • I was suggesting your chart may already be showing that.

    – Brian
    Feb 12 '15 at 12:51












  • Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

    – K Erlandsson
    Feb 12 '15 at 12:54
















Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

– Brian
Feb 12 '15 at 12:08






Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.

– Brian
Feb 12 '15 at 12:08














Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

– K Erlandsson
Feb 12 '15 at 12:14





Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.

– K Erlandsson
Feb 12 '15 at 12:14













I was suggesting your chart may already be showing that.

– Brian
Feb 12 '15 at 12:51






I was suggesting your chart may already be showing that.

– Brian
Feb 12 '15 at 12:51














Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

– K Erlandsson
Feb 12 '15 at 12:54





Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.

– K Erlandsson
Feb 12 '15 at 12:54










7 Answers
7






active

oldest

votes


















39














On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.



Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.



I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.



High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.






share|improve this answer

























  • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

    – K Erlandsson
    Feb 13 '15 at 9:17











  • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

    – K Erlandsson
    Feb 13 '15 at 9:26











  • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

    – deltaray
    Feb 13 '15 at 20:36











  • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

    – K Erlandsson
    Feb 14 '15 at 13:13











  • For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

    – Nickolay
    Sep 12 '18 at 12:50


















23














Load is a very deceptive number. Take it with a grain of salt.



If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).



Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.



import os, sys

while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()


Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.



/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <err.h>
#include <errno.h>

#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>

#define ITERATIONS 50000

int maxchild = 0;
volatile int numspawned = 0;

void childhandle(
int signal)

int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;



/* Stupid task for our children to do */
void do_task(
void)

int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);


int main()
pid_t pid;

struct sigaction act;
sigset_t sigs, old;

maxchild = sysconf(_SC_NPROCESSORS_ONLN);

/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");

/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");

/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");

else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;

/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");




The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.



So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).






share|improve this answer

























  • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

    – K Erlandsson
    Feb 12 '15 at 13:23











  • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

    – Matthew Ife
    Feb 12 '15 at 13:30












  • Good examples !

    – Xavier Lucas
    Feb 12 '15 at 18:06


















5














If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.



If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.



UPDATE :



It may not be clear in my original answer, so I'm clarifying now :



The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.



You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)






share|improve this answer

























  • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

    – K Erlandsson
    Feb 12 '15 at 13:24












  • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

    – Xavier Lucas
    Feb 12 '15 at 14:26



















2














Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.






share|improve this answer






























    1














    While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.



    However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.



    I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.



    import java.util.concurrent.ArrayBlockingQueue;
    import java.util.concurrent.ThreadPoolExecutor;
    import java.util.concurrent.TimeUnit;

    public class MultiThreadLoad

    private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
    new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

    public void load()
    while (true)
    e.execute(new Runnable()

    @Override
    public void run()
    sleep100Ms();
    for (long i = 0; i < 5000000l; i++)
    ;


    private void sleep100Ms()
    try
    Thread.sleep(100);
    catch (InterruptedException e)
    throw new RuntimeException(e);


    );



    public static void main(String[] args)
    new MultiThreadLoad().load();





    To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.






    share|improve this answer






























      0














      Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
      So you have 12 cores, and for LA to increase significantly the number of processes must be really high.



      Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.



      If my assumptions are correct, then the CPU usage didn't increased significantly.
      Thus, LA didn't increased significantly.






      share|improve this answer























      • The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

        – K Erlandsson
        Feb 12 '15 at 12:31


















      0














      The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.



      (in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).




      if our CPUs are busy 75% of the time, shouldn't we see higher load average?




      Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.



      update



      One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.






      share|improve this answer

























      • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

        – Matthew Ife
        Feb 12 '15 at 14:10











      • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

        – symcbean
        Feb 12 '15 at 14:40











      • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

        – K Erlandsson
        Feb 12 '15 at 14:43












      • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

        – K Erlandsson
        Feb 12 '15 at 14:44











      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "2"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f667078%2fhigh-cpu-utilization-but-low-load-average%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      7 Answers
      7






      active

      oldest

      votes








      7 Answers
      7






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      39














      On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.



      Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.



      I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.



      High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.






      share|improve this answer

























      • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

        – K Erlandsson
        Feb 13 '15 at 9:17











      • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

        – K Erlandsson
        Feb 13 '15 at 9:26











      • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

        – deltaray
        Feb 13 '15 at 20:36











      • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

        – K Erlandsson
        Feb 14 '15 at 13:13











      • For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

        – Nickolay
        Sep 12 '18 at 12:50















      39














      On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.



      Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.



      I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.



      High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.






      share|improve this answer

























      • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

        – K Erlandsson
        Feb 13 '15 at 9:17











      • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

        – K Erlandsson
        Feb 13 '15 at 9:26











      • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

        – deltaray
        Feb 13 '15 at 20:36











      • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

        – K Erlandsson
        Feb 14 '15 at 13:13











      • For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

        – Nickolay
        Sep 12 '18 at 12:50













      39












      39








      39







      On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.



      Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.



      I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.



      High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.






      share|improve this answer















      On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.



      Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.



      I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.



      High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 47 mins ago

























      answered Feb 12 '15 at 21:38









      deltaraydeltaray

      980613




      980613












      • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

        – K Erlandsson
        Feb 13 '15 at 9:17











      • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

        – K Erlandsson
        Feb 13 '15 at 9:26











      • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

        – deltaray
        Feb 13 '15 at 20:36











      • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

        – K Erlandsson
        Feb 14 '15 at 13:13











      • For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

        – Nickolay
        Sep 12 '18 at 12:50

















      • How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

        – K Erlandsson
        Feb 13 '15 at 9:17











      • I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

        – K Erlandsson
        Feb 13 '15 at 9:26











      • I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

        – deltaray
        Feb 13 '15 at 20:36











      • All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

        – K Erlandsson
        Feb 14 '15 at 13:13











      • For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

        – Nickolay
        Sep 12 '18 at 12:50
















      How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

      – K Erlandsson
      Feb 13 '15 at 9:17





      How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?

      – K Erlandsson
      Feb 13 '15 at 9:17













      I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

      – K Erlandsson
      Feb 13 '15 at 9:26





      I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!

      – K Erlandsson
      Feb 13 '15 at 9:26













      I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

      – deltaray
      Feb 13 '15 at 20:36





      I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.

      – deltaray
      Feb 13 '15 at 20:36













      All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

      – K Erlandsson
      Feb 14 '15 at 13:13





      All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?

      – K Erlandsson
      Feb 14 '15 at 13:13













      For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

      – Nickolay
      Sep 12 '18 at 12:50





      For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.

      – Nickolay
      Sep 12 '18 at 12:50













      23














      Load is a very deceptive number. Take it with a grain of salt.



      If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).



      Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.



      import os, sys

      while True:
      for j in range(8):
      parent = os.fork()
      if not parent:
      n = 0
      for i in range(10000):
      n += 1
      sys.exit(0)
      for j in range(8):
      os.wait()


      Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.



      /* Compile with flags -O0 */
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>

      #include <err.h>
      #include <errno.h>

      #include <sys/signal.h>
      #include <sys/types.h>
      #include <sys/wait.h>

      #define ITERATIONS 50000

      int maxchild = 0;
      volatile int numspawned = 0;

      void childhandle(
      int signal)

      int stat;
      /* Handle all exited children, until none are left to handle */
      while (waitpid(-1, &stat, WNOHANG) > 0)
      numspawned--;



      /* Stupid task for our children to do */
      void do_task(
      void)

      int i,j;
      for (i=0; i < ITERATIONS; i++)
      j++;
      exit(0);


      int main()
      pid_t pid;

      struct sigaction act;
      sigset_t sigs, old;

      maxchild = sysconf(_SC_NPROCESSORS_ONLN);

      /* Setup child handler */
      memset(&act, 0, sizeof(act));
      act.sa_handler = childhandle;
      if (sigaction(SIGCHLD, &act, NULL) < 0)
      err(EXIT_FAILURE, "sigaction");

      /* Defer the sigchild signal */
      sigemptyset(&sigs);
      sigaddset(&sigs, SIGCHLD);
      if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
      err(EXIT_FAILURE, "sigprocmask");

      /* Create processes, where our maxchild value is not met */
      while (1)
      while (numspawned < maxchild)
      pid = fork();
      if (pid < 0)
      err(EXIT_FAILURE, "fork");

      else if (pid == 0) /* child process */
      do_task();
      else /* parent */
      numspawned++;

      /* Atomically unblocks signal, handler then picks it up, reblocks on finish */
      if (sigsuspend(&old) < 0 && errno != EINTR)
      err(EXIT_FAILURE, "sigsuspend");




      The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.



      So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).






      share|improve this answer

























      • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

        – K Erlandsson
        Feb 12 '15 at 13:23











      • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

        – Matthew Ife
        Feb 12 '15 at 13:30












      • Good examples !

        – Xavier Lucas
        Feb 12 '15 at 18:06















      23














      Load is a very deceptive number. Take it with a grain of salt.



      If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).



      Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.



      import os, sys

      while True:
      for j in range(8):
      parent = os.fork()
      if not parent:
      n = 0
      for i in range(10000):
      n += 1
      sys.exit(0)
      for j in range(8):
      os.wait()


      Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.



      /* Compile with flags -O0 */
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>

      #include <err.h>
      #include <errno.h>

      #include <sys/signal.h>
      #include <sys/types.h>
      #include <sys/wait.h>

      #define ITERATIONS 50000

      int maxchild = 0;
      volatile int numspawned = 0;

      void childhandle(
      int signal)

      int stat;
      /* Handle all exited children, until none are left to handle */
      while (waitpid(-1, &stat, WNOHANG) > 0)
      numspawned--;



      /* Stupid task for our children to do */
      void do_task(
      void)

      int i,j;
      for (i=0; i < ITERATIONS; i++)
      j++;
      exit(0);


      int main()
      pid_t pid;

      struct sigaction act;
      sigset_t sigs, old;

      maxchild = sysconf(_SC_NPROCESSORS_ONLN);

      /* Setup child handler */
      memset(&act, 0, sizeof(act));
      act.sa_handler = childhandle;
      if (sigaction(SIGCHLD, &act, NULL) < 0)
      err(EXIT_FAILURE, "sigaction");

      /* Defer the sigchild signal */
      sigemptyset(&sigs);
      sigaddset(&sigs, SIGCHLD);
      if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
      err(EXIT_FAILURE, "sigprocmask");

      /* Create processes, where our maxchild value is not met */
      while (1)
      while (numspawned < maxchild)
      pid = fork();
      if (pid < 0)
      err(EXIT_FAILURE, "fork");

      else if (pid == 0) /* child process */
      do_task();
      else /* parent */
      numspawned++;

      /* Atomically unblocks signal, handler then picks it up, reblocks on finish */
      if (sigsuspend(&old) < 0 && errno != EINTR)
      err(EXIT_FAILURE, "sigsuspend");




      The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.



      So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).






      share|improve this answer

























      • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

        – K Erlandsson
        Feb 12 '15 at 13:23











      • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

        – Matthew Ife
        Feb 12 '15 at 13:30












      • Good examples !

        – Xavier Lucas
        Feb 12 '15 at 18:06













      23












      23








      23







      Load is a very deceptive number. Take it with a grain of salt.



      If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).



      Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.



      import os, sys

      while True:
      for j in range(8):
      parent = os.fork()
      if not parent:
      n = 0
      for i in range(10000):
      n += 1
      sys.exit(0)
      for j in range(8):
      os.wait()


      Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.



      /* Compile with flags -O0 */
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>

      #include <err.h>
      #include <errno.h>

      #include <sys/signal.h>
      #include <sys/types.h>
      #include <sys/wait.h>

      #define ITERATIONS 50000

      int maxchild = 0;
      volatile int numspawned = 0;

      void childhandle(
      int signal)

      int stat;
      /* Handle all exited children, until none are left to handle */
      while (waitpid(-1, &stat, WNOHANG) > 0)
      numspawned--;



      /* Stupid task for our children to do */
      void do_task(
      void)

      int i,j;
      for (i=0; i < ITERATIONS; i++)
      j++;
      exit(0);


      int main()
      pid_t pid;

      struct sigaction act;
      sigset_t sigs, old;

      maxchild = sysconf(_SC_NPROCESSORS_ONLN);

      /* Setup child handler */
      memset(&act, 0, sizeof(act));
      act.sa_handler = childhandle;
      if (sigaction(SIGCHLD, &act, NULL) < 0)
      err(EXIT_FAILURE, "sigaction");

      /* Defer the sigchild signal */
      sigemptyset(&sigs);
      sigaddset(&sigs, SIGCHLD);
      if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
      err(EXIT_FAILURE, "sigprocmask");

      /* Create processes, where our maxchild value is not met */
      while (1)
      while (numspawned < maxchild)
      pid = fork();
      if (pid < 0)
      err(EXIT_FAILURE, "fork");

      else if (pid == 0) /* child process */
      do_task();
      else /* parent */
      numspawned++;

      /* Atomically unblocks signal, handler then picks it up, reblocks on finish */
      if (sigsuspend(&old) < 0 && errno != EINTR)
      err(EXIT_FAILURE, "sigsuspend");




      The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.



      So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).






      share|improve this answer















      Load is a very deceptive number. Take it with a grain of salt.



      If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).



      Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.



      import os, sys

      while True:
      for j in range(8):
      parent = os.fork()
      if not parent:
      n = 0
      for i in range(10000):
      n += 1
      sys.exit(0)
      for j in range(8):
      os.wait()


      Another implementation, this one avoids wait in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.



      /* Compile with flags -O0 */
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>

      #include <err.h>
      #include <errno.h>

      #include <sys/signal.h>
      #include <sys/types.h>
      #include <sys/wait.h>

      #define ITERATIONS 50000

      int maxchild = 0;
      volatile int numspawned = 0;

      void childhandle(
      int signal)

      int stat;
      /* Handle all exited children, until none are left to handle */
      while (waitpid(-1, &stat, WNOHANG) > 0)
      numspawned--;



      /* Stupid task for our children to do */
      void do_task(
      void)

      int i,j;
      for (i=0; i < ITERATIONS; i++)
      j++;
      exit(0);


      int main()
      pid_t pid;

      struct sigaction act;
      sigset_t sigs, old;

      maxchild = sysconf(_SC_NPROCESSORS_ONLN);

      /* Setup child handler */
      memset(&act, 0, sizeof(act));
      act.sa_handler = childhandle;
      if (sigaction(SIGCHLD, &act, NULL) < 0)
      err(EXIT_FAILURE, "sigaction");

      /* Defer the sigchild signal */
      sigemptyset(&sigs);
      sigaddset(&sigs, SIGCHLD);
      if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
      err(EXIT_FAILURE, "sigprocmask");

      /* Create processes, where our maxchild value is not met */
      while (1)
      while (numspawned < maxchild)
      pid = fork();
      if (pid < 0)
      err(EXIT_FAILURE, "fork");

      else if (pid == 0) /* child process */
      do_task();
      else /* parent */
      numspawned++;

      /* Atomically unblocks signal, handler then picks it up, reblocks on finish */
      if (sigsuspend(&old) < 0 && errno != EINTR)
      err(EXIT_FAILURE, "sigsuspend");




      The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.



      So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 12 '15 at 14:08

























      answered Feb 12 '15 at 13:05









      Matthew IfeMatthew Ife

      20.5k24663




      20.5k24663












      • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

        – K Erlandsson
        Feb 12 '15 at 13:23











      • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

        – Matthew Ife
        Feb 12 '15 at 13:30












      • Good examples !

        – Xavier Lucas
        Feb 12 '15 at 18:06

















      • Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

        – K Erlandsson
        Feb 12 '15 at 13:23











      • I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

        – Matthew Ife
        Feb 12 '15 at 13:30












      • Good examples !

        – Xavier Lucas
        Feb 12 '15 at 18:06
















      Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

      – K Erlandsson
      Feb 12 '15 at 13:23





      Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?

      – K Erlandsson
      Feb 12 '15 at 13:23













      I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

      – Matthew Ife
      Feb 12 '15 at 13:30






      I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.

      – Matthew Ife
      Feb 12 '15 at 13:30














      Good examples !

      – Xavier Lucas
      Feb 12 '15 at 18:06





      Good examples !

      – Xavier Lucas
      Feb 12 '15 at 18:06











      5














      If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.



      If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.



      UPDATE :



      It may not be clear in my original answer, so I'm clarifying now :



      The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.



      You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)






      share|improve this answer

























      • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

        – K Erlandsson
        Feb 12 '15 at 13:24












      • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

        – Xavier Lucas
        Feb 12 '15 at 14:26
















      5














      If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.



      If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.



      UPDATE :



      It may not be clear in my original answer, so I'm clarifying now :



      The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.



      You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)






      share|improve this answer

























      • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

        – K Erlandsson
        Feb 12 '15 at 13:24












      • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

        – Xavier Lucas
        Feb 12 '15 at 14:26














      5












      5








      5







      If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.



      If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.



      UPDATE :



      It may not be clear in my original answer, so I'm clarifying now :



      The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.



      You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)






      share|improve this answer















      If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.



      If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.



      UPDATE :



      It may not be clear in my original answer, so I'm clarifying now :



      The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked.



      You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait metric changing)







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Feb 12 '15 at 14:37

























      answered Feb 12 '15 at 13:00









      Xavier LucasXavier Lucas

      10.5k23245




      10.5k23245












      • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

        – K Erlandsson
        Feb 12 '15 at 13:24












      • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

        – Xavier Lucas
        Feb 12 '15 at 14:26


















      • It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

        – K Erlandsson
        Feb 12 '15 at 13:24












      • @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

        – Xavier Lucas
        Feb 12 '15 at 14:26

















      It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

      – K Erlandsson
      Feb 12 '15 at 13:24






      It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?

      – K Erlandsson
      Feb 12 '15 at 13:24














      @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

      – Xavier Lucas
      Feb 12 '15 at 14:26






      @KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.

      – Xavier Lucas
      Feb 12 '15 at 14:26












      2














      Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.






      share|improve this answer



























        2














        Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.






        share|improve this answer

























          2












          2








          2







          Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.






          share|improve this answer













          Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 12 '15 at 20:34









          psusipsusi

          2,6771119




          2,6771119





















              1














              While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.



              However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.



              I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.



              import java.util.concurrent.ArrayBlockingQueue;
              import java.util.concurrent.ThreadPoolExecutor;
              import java.util.concurrent.TimeUnit;

              public class MultiThreadLoad

              private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
              new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

              public void load()
              while (true)
              e.execute(new Runnable()

              @Override
              public void run()
              sleep100Ms();
              for (long i = 0; i < 5000000l; i++)
              ;


              private void sleep100Ms()
              try
              Thread.sleep(100);
              catch (InterruptedException e)
              throw new RuntimeException(e);


              );



              public static void main(String[] args)
              new MultiThreadLoad().load();





              To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.






              share|improve this answer



























                1














                While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.



                However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.



                I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.



                import java.util.concurrent.ArrayBlockingQueue;
                import java.util.concurrent.ThreadPoolExecutor;
                import java.util.concurrent.TimeUnit;

                public class MultiThreadLoad

                private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
                new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

                public void load()
                while (true)
                e.execute(new Runnable()

                @Override
                public void run()
                sleep100Ms();
                for (long i = 0; i < 5000000l; i++)
                ;


                private void sleep100Ms()
                try
                Thread.sleep(100);
                catch (InterruptedException e)
                throw new RuntimeException(e);


                );



                public static void main(String[] args)
                new MultiThreadLoad().load();





                To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.






                share|improve this answer

























                  1












                  1








                  1







                  While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.



                  However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.



                  I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.



                  import java.util.concurrent.ArrayBlockingQueue;
                  import java.util.concurrent.ThreadPoolExecutor;
                  import java.util.concurrent.TimeUnit;

                  public class MultiThreadLoad

                  private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
                  new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

                  public void load()
                  while (true)
                  e.execute(new Runnable()

                  @Override
                  public void run()
                  sleep100Ms();
                  for (long i = 0; i < 5000000l; i++)
                  ;


                  private void sleep100Ms()
                  try
                  Thread.sleep(100);
                  catch (InterruptedException e)
                  throw new RuntimeException(e);


                  );



                  public static void main(String[] args)
                  new MultiThreadLoad().load();





                  To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.






                  share|improve this answer













                  While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.



                  However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.



                  I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.



                  import java.util.concurrent.ArrayBlockingQueue;
                  import java.util.concurrent.ThreadPoolExecutor;
                  import java.util.concurrent.TimeUnit;

                  public class MultiThreadLoad

                  private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
                  new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());

                  public void load()
                  while (true)
                  e.execute(new Runnable()

                  @Override
                  public void run()
                  sleep100Ms();
                  for (long i = 0; i < 5000000l; i++)
                  ;


                  private void sleep100Ms()
                  try
                  Thread.sleep(100);
                  catch (InterruptedException e)
                  throw new RuntimeException(e);


                  );



                  public static void main(String[] args)
                  new MultiThreadLoad().load();





                  To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 17 '15 at 8:45









                  K ErlandssonK Erlandsson

                  2751612




                  2751612





















                      0














                      Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
                      So you have 12 cores, and for LA to increase significantly the number of processes must be really high.



                      Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.



                      If my assumptions are correct, then the CPU usage didn't increased significantly.
                      Thus, LA didn't increased significantly.






                      share|improve this answer























                      • The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                        – K Erlandsson
                        Feb 12 '15 at 12:31















                      0














                      Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
                      So you have 12 cores, and for LA to increase significantly the number of processes must be really high.



                      Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.



                      If my assumptions are correct, then the CPU usage didn't increased significantly.
                      Thus, LA didn't increased significantly.






                      share|improve this answer























                      • The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                        – K Erlandsson
                        Feb 12 '15 at 12:31













                      0












                      0








                      0







                      Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
                      So you have 12 cores, and for LA to increase significantly the number of processes must be really high.



                      Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.



                      If my assumptions are correct, then the CPU usage didn't increased significantly.
                      Thus, LA didn't increased significantly.






                      share|improve this answer













                      Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
                      So you have 12 cores, and for LA to increase significantly the number of processes must be really high.



                      Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp the total amount of CPU load is 1200%.



                      If my assumptions are correct, then the CPU usage didn't increased significantly.
                      Thus, LA didn't increased significantly.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Feb 12 '15 at 12:21









                      drookiedrookie

                      6,13711219




                      6,13711219












                      • The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                        – K Erlandsson
                        Feb 12 '15 at 12:31

















                      • The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                        – K Erlandsson
                        Feb 12 '15 at 12:31
















                      The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                      – K Erlandsson
                      Feb 12 '15 at 12:31





                      The cpu usage is taken from mpstat, the all row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.

                      – K Erlandsson
                      Feb 12 '15 at 12:31











                      0














                      The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.



                      (in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).




                      if our CPUs are busy 75% of the time, shouldn't we see higher load average?




                      Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.



                      update



                      One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.






                      share|improve this answer

























                      • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                        – Matthew Ife
                        Feb 12 '15 at 14:10











                      • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                        – symcbean
                        Feb 12 '15 at 14:40











                      • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                        – K Erlandsson
                        Feb 12 '15 at 14:43












                      • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                        – K Erlandsson
                        Feb 12 '15 at 14:44















                      0














                      The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.



                      (in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).




                      if our CPUs are busy 75% of the time, shouldn't we see higher load average?




                      Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.



                      update



                      One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.






                      share|improve this answer

























                      • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                        – Matthew Ife
                        Feb 12 '15 at 14:10











                      • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                        – symcbean
                        Feb 12 '15 at 14:40











                      • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                        – K Erlandsson
                        Feb 12 '15 at 14:43












                      • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                        – K Erlandsson
                        Feb 12 '15 at 14:44













                      0












                      0








                      0







                      The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.



                      (in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).




                      if our CPUs are busy 75% of the time, shouldn't we see higher load average?




                      Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.



                      update



                      One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.






                      share|improve this answer















                      The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.



                      (in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).




                      if our CPUs are busy 75% of the time, shouldn't we see higher load average?




                      Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.



                      update



                      One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Feb 12 '15 at 14:55

























                      answered Feb 12 '15 at 13:55









                      symcbeansymcbean

                      18.7k12339




                      18.7k12339












                      • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                        – Matthew Ife
                        Feb 12 '15 at 14:10











                      • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                        – symcbean
                        Feb 12 '15 at 14:40











                      • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                        – K Erlandsson
                        Feb 12 '15 at 14:43












                      • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                        – K Erlandsson
                        Feb 12 '15 at 14:44

















                      • The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                        – Matthew Ife
                        Feb 12 '15 at 14:10











                      • ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                        – symcbean
                        Feb 12 '15 at 14:40











                      • I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                        – K Erlandsson
                        Feb 12 '15 at 14:43












                      • Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                        – K Erlandsson
                        Feb 12 '15 at 14:44
















                      The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                      – Matthew Ife
                      Feb 12 '15 at 14:10





                      The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.

                      – Matthew Ife
                      Feb 12 '15 at 14:10













                      ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                      – symcbean
                      Feb 12 '15 at 14:40





                      ...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.

                      – symcbean
                      Feb 12 '15 at 14:40













                      I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                      – K Erlandsson
                      Feb 12 '15 at 14:43






                      I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the all row still shows the average per CPU. I will clarify the question.

                      – K Erlandsson
                      Feb 12 '15 at 14:43














                      Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                      – K Erlandsson
                      Feb 12 '15 at 14:44





                      Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.

                      – K Erlandsson
                      Feb 12 '15 at 14:44

















                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Server Fault!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f667078%2fhigh-cpu-utilization-but-low-load-average%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Куамањотепек (Чилапа де Алварез) Садржај Становништво Види још Референце Спољашње везе Мени за навигацију17°19′47″N 99°1′51″W / 17.32972° СГШ; 99.03083° ЗГД / 17.32972; -99.0308317°19′47″N 99°1′51″W / 17.32972° СГШ; 99.03083° ЗГД / 17.32972; -99.030838877656„Instituto Nacional de Estadística y Geografía”„The GeoNames geographical database”Мексичка насељапроширитиуу

                      How to make RAID controller rescan devices The 2019 Stack Overflow Developer Survey Results Are InLSI MegaRAID SAS 9261-8i: Disk isn't recognized after replacementHow to monitor the hard disk status behind Dell PERC H710 Raid Controller with CentOS 6?LSI MegaRAID - Recreate missing RAID 1 arrayext. 2-bay USB-Drive with RAID: btrfs RAID vs built-in RAIDInvalid SAS topologyDoes enabling JBOD mode on LSI based controllers affect existing logical disks/arrays?Why is there a shift between the WWN reported from the controller and the Linux system?Optimal RAID 6+0 Setup for 40+ 4TB DisksAccidental SAS cable removal

                      Срби Садржај Географија Етимологија Генетика Историја Језик Религија Популација Познати Срби Види још Напомене Референце Извори Литература Спољашње везе Мени за навигацијууrs.one.un.orgАрхивираноАрхивирано из оригиналаПопис становништва из 2011. годинеCOMMUNITY PROFILE: SERB COMMUNITY„1996 population census in Bosnia and Herzegovina”„CIA - The World Factbook - Bosnia and Herzegovina”American FactFinder - Results„2011 National Household Survey: Data tables”„Srbi u Nemačkoj | Srbi u Njemačkoj | Zentralrat der Serben in Deutschland”оригинала„Vesti online - Srpski informativni portal”„The Serbian Diaspora and Youth: Cross-Border Ties and Opportunities for Development”оригиналаSerben-Demo eskaliert in Wien„The People of Australia – Statistics from the 2011 Census”„Erstmals über eine Million EU- und EFTA Angehörige in der Schweiz”STANOVNIŠTVO PREMA NARODNOSTI – DETALJNA KLASIFIKACIJA – POPIS 2011.(Завод за статистику Црне Горе)title=Présentation de la République de SerbieSerbian | EthnologuePopulation by ethnic affiliation, Slovenia, Census 1953, 1961, 1971, 1981, 1991 and 2002Попис на населението, домаќинствата и становите во Република Македонија, 2002: Дефинитивни податоциALBANIJA ETNIČKI ČISTI SRBE: Iščezlo 100.000 ljudi pokrštavanjem, kao što su to radile ustaše u NDH! | Telegraf – Najnovije vestiИз удаљене Аргентине„Tab11. Populaţia stabilă după etnie şi limba maternă, pe categorii de localităţi”Суседи броје Србе„Srpska Dijaspora”оригиналаMinifacts about Norway 2012„Statistiques - 01.06.2008”ПРЕДСЕДНИК СРБИЈЕ СА СРБИМА У БРАТИСЛАВИСлавка Драшковић: Многа питања Срба у Црној Гори нерешенаThe Spread of the SlavesGoogle Book„Distribution of European Y-chromosome DNA (Y-DNA) haplogroups by country in percentage”American Journal of Physical Anthropology 142:380–390 (2010)„Архивирана копија”оригинала„Haplogroup I2 (Y-DNA)”„Архивирана копија”оригиналаVTS 01 1 - YouTubeПрви сукоби Срба и Турака - Политикин забавникАрхивираноConstantine Porphyrogenitus: De Administrando ImperioВизантиски извори за историју народа ЈугославијеDe conversione Croatorum et Serborum: A Lost SourceDe conversione Croatorum et Serborum: Изгубљени извор Константина ПорфирогенитаИсторија српске државностиИсторија српског народаСрбофобија и њени извориСерска област после Душанове смртиИсторија ВизантијеИсторија средњовековне босанске државеСрби међу европским народимаСрби у средњем векуМедијиПодациууууу00577267