High CPU utilization but low load average The 2019 Stack Overflow Developer Survey Results Are InHow to understand the memory usage and load average in linux serverWhat's going on with my server? High load, lots of idle CPU time, low disk utilizationHigh load average, low cpuLoad average is 50 while CPU Utilization is %60Uneven CPU core utilizationHigh Load Average with modest CPU Utilization and almost no IOHigh load with low CPU usage and low IO usage on Solaris with ZFS and MySQLLow load average, but high %user and %system cpu usageLinux: Extreme load on idle CPUDiscrepancy between “ps aux” and the 1-minute average server loadAll CPU busy, high avg load, but tasks CPU usage don't add up
Likelihood that a superbug or lethal virus could come from a landfill
Why can't wing-mounted spoilers be used to steepen approaches?
Does adding complexity mean a more secure cipher?
If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?
Deal with toxic manager when you can't quit
The phrase "to the numbers born"?
A female thief is not sold to make restitution -- so what happens instead?
Why are there uneven bright areas in this photo of black hole?
What do I do when my TA workload is more than expected?
Is an up-to-date browser secure on an out-of-date OS?
What information about me do stores get via my credit card?
Question on an engine pulling a train
Am I ethically obligated to go into work on an off day if the reason is sudden?
Why can I use a list index as an indexing variable in a for loop?
C++ auto on int16_t casts to integer
Christmas short horror story about a woman who becomes trapped in another body?
Using `min_active_rowversion` for global temporary tables
The following signatures were invalid: EXPKEYSIG 1397BC53640DB551
What is this sharp, curved notch on my knife for?
Why is this recursive code so slow?
Why didn't the Event Horizon Telescope team mention Sagittarius A*?
Is there a way to generate a uniformly distributed point on a sphere from a fixed amount of random real numbers?
How much of the clove should I use when using big garlic heads?
What was the last CPU that did not have the x87 floating-point unit built in?
High CPU utilization but low load average
The 2019 Stack Overflow Developer Survey Results Are InHow to understand the memory usage and load average in linux serverWhat's going on with my server? High load, lots of idle CPU time, low disk utilizationHigh load average, low cpuLoad average is 50 while CPU Utilization is %60Uneven CPU core utilizationHigh Load Average with modest CPU Utilization and almost no IOHigh load with low CPU usage and low IO usage on Solaris with ZFS and MySQLLow load average, but high %user and %system cpu usageLinux: Extreme load on idle CPUDiscrepancy between “ps aux” and the 1-minute average server loadAll CPU busy, high avg load, but tasks CPU usage don't add up
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
We are running into a strange behavior where we see high CPU utilization but quite low load average.
The behavior is best illustrated by the following graphs from our monitoring system.
At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.
We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.
The CPU utilization data is collected by running /usr/bin/mpstat 60 1
each minute. The data for the all
row and the %usr
column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top
.
The load average figure is taken from /proc/loadavg
each minute.
uname -a
gives:
Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)
We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.
If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?
Are we interpreting our data correctly? What can cause this behavior?
linux cpu-usage troubleshooting load-average
add a comment |
We are running into a strange behavior where we see high CPU utilization but quite low load average.
The behavior is best illustrated by the following graphs from our monitoring system.
At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.
We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.
The CPU utilization data is collected by running /usr/bin/mpstat 60 1
each minute. The data for the all
row and the %usr
column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top
.
The load average figure is taken from /proc/loadavg
each minute.
uname -a
gives:
Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)
We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.
If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?
Are we interpreting our data correctly? What can cause this behavior?
linux cpu-usage troubleshooting load-average
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54
add a comment |
We are running into a strange behavior where we see high CPU utilization but quite low load average.
The behavior is best illustrated by the following graphs from our monitoring system.
At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.
We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.
The CPU utilization data is collected by running /usr/bin/mpstat 60 1
each minute. The data for the all
row and the %usr
column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top
.
The load average figure is taken from /proc/loadavg
each minute.
uname -a
gives:
Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)
We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.
If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?
Are we interpreting our data correctly? What can cause this behavior?
linux cpu-usage troubleshooting load-average
We are running into a strange behavior where we see high CPU utilization but quite low load average.
The behavior is best illustrated by the following graphs from our monitoring system.
At about 11:57 the CPU utilization goes from 25% to 75%. The load average is not significantly changed.
We run servers with 12 cores with 2 hyper threads each. The OS sees this as 24 CPUs.
The CPU utilization data is collected by running /usr/bin/mpstat 60 1
each minute. The data for the all
row and the %usr
column is shown in the chart above. I am certain this does show the average per CPU data, not the "stacked" utilization. While we see 75% utilization in the chart we see a process showing to use about 2000% "stacked" CPU in top
.
The load average figure is taken from /proc/loadavg
each minute.
uname -a
gives:
Linux ab04 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Linux dist is Red Hat Enterprise Linux Server release 6.3 (Santiago)
We run a couple of Java web applications under fairly heavy load on the machines, think 100 requests/s per machine.
If I interpret the CPU utilization data correctly, when we have 75% CPU utilization it means that our CPUs are executing a process 75% of the time, on average. However, if our CPUs are busy 75% of the time, shouldn't we see higher load average? How could the CPUs be 75% busy while we only have 2-4 jobs in the run queue?
Are we interpreting our data correctly? What can cause this behavior?
linux cpu-usage troubleshooting load-average
linux cpu-usage troubleshooting load-average
edited Feb 12 '15 at 14:46
K Erlandsson
asked Feb 12 '15 at 11:53
K ErlandssonK Erlandsson
2751612
2751612
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54
add a comment |
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54
add a comment |
7 Answers
7
active
oldest
votes
On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.
Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.
I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.
High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
add a comment |
Load is a very deceptive number. Take it with a grain of salt.
If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).
Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.
import os, sys
while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()
Another implementation, this one avoids wait
in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.
/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <errno.h>
#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#define ITERATIONS 50000
int maxchild = 0;
volatile int numspawned = 0;
void childhandle(
int signal)
int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;
/* Stupid task for our children to do */
void do_task(
void)
int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);
int main()
pid_t pid;
struct sigaction act;
sigset_t sigs, old;
maxchild = sysconf(_SC_NPROCESSORS_ONLN);
/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");
/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");
/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");
else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;
/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");
The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.
So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
add a comment |
If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.
If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.
UPDATE :
It may not be clear in my original answer, so I'm clarifying now :
The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked
.
You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait
metric changing)
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
add a comment |
Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.
add a comment |
While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.
However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.
I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class MultiThreadLoad
private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
public void load()
while (true)
e.execute(new Runnable()
@Override
public void run()
sleep100Ms();
for (long i = 0; i < 5000000l; i++)
;
private void sleep100Ms()
try
Thread.sleep(100);
catch (InterruptedException e)
throw new RuntimeException(e);
);
public static void main(String[] args)
new MultiThreadLoad().load();
To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.
add a comment |
Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
So you have 12 cores, and for LA to increase significantly the number of processes must be really high.
Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp
, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp
the total amount of CPU load is 1200%.
If my assumptions are correct, then the CPU usage didn't increased significantly.
Thus, LA didn't increased significantly.
The cpu usage is taken from mpstat, theall
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.
– K Erlandsson
Feb 12 '15 at 12:31
add a comment |
The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.
(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).
if our CPUs are busy 75% of the time, shouldn't we see higher load average?
Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.
update
One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, theall
row still shows the average per CPU. I will clarify the question.
– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f667078%2fhigh-cpu-utilization-but-low-load-average%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.
Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.
I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.
High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
add a comment |
On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.
Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.
I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.
High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
add a comment |
On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.
Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.
I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.
High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.
On Linux at least, the load average and CPU utilization are actually two different things. Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. CPU utilization is a measure of how busy the CPU is right now. The most load that a single CPU thread pegged at 100% for one minute can "contribute" to the 1 minute load average is 1. A 4 core CPU with hyperthreading (8 virtual cores) all at 100% for 1 minute would contribute 8 to the 1 minute load average.
Often times these two numbers have patterns that correlate to each other, but you can't think of them as the same. You can have a high load with nearly 0% CPU utilization (such as when you have a lot of IO data stuck in a wait state) and you can have a load of 1 and 100% CPU, when you have a single threaded process running full tilt. Also for short periods of time you can see the CPU at close to 100% but the load is still below 1 because the average metrics haven't "caught up" yet.
I've seen a server have a load of over 15,000 (yes really that's not a typo) and a CPU % of close to 0%. It happened because a Samba share was having issues and lots and lots of clients started getting stuck in an IO wait state. Chances are if you are seeing a regular high load number with no corresponding CPU activity, you are having a storage problem of some kind. On virtual machines this can also mean that there are other VMs heavily competing for storage resources on the same VM host.
High load is also not necessarily a bad thing, most of the time it just means the system is being utilized to it's fullest capacity or maybe is beyond it's capability to keep up (if the load number is higher than the number of processor cores). At a place I used to be a sysadmin, they had someone who watched the load average on their primary system closer than Nagios did. When the load was high, they would call me 24/7 faster than you could say SMTP. Most of the time nothing was actually wrong, but they associated the load number with something being wrong and watched it like a hawk. After checking, my response was usually that the system was just doing it's job. Of course this was the same place where the load got up over 15000 (not the same server though) so sometimes it does mean something is wrong. You have to consider the purpose of your system. If it's a workhorse, then expect the load to be naturally high.
edited 47 mins ago
answered Feb 12 '15 at 21:38
deltaraydeltaray
980613
980613
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
add a comment |
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
How do you mean that I can have a load of 1 and 100% CPU with a single threaded process? What kind of threads are you talking about? If we consider our Java processes, they have tons of threads, but I was under the assumption that the threads were treated as processes from the perspective of the OS (they have separate PIDs on Linux after all). Could it be so that a single multi threaded java process is only counted as one task from a load average perspective?
– K Erlandsson
Feb 13 '15 at 9:17
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I just did a test on my own, the threads in a Java process contributes to the load average as if they where separate processes (I.e. a java class that runs 10 threads in a busy-wait loop gives me a load close to 10). I would appreciate a clarification about the threaded process you mentioned above. Thank you!
– K Erlandsson
Feb 13 '15 at 9:26
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
I mean if you have a non-multithreading process (ie, one that just uses a single CPU at a time). For instance if you just write a simple C program that runs a busy loop, its just a single thread running and uses only 1 CPU at a time.
– deltaray
Feb 13 '15 at 20:36
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
All information I have found says that threads count as separate processes when seen from the kernel and when calculating load. Hence I fail to see how I could have a multi threaded process on full tilt resulting in 1 load and 100% CPU on a multi-CPU system. Could you please help me understand how you mean?
– K Erlandsson
Feb 14 '15 at 13:13
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
For anyone looking for more detail: "Linux Load Averages: Solving the Mystery" by Brendan Gregg had all the answers I ever needed.
– Nickolay
Sep 12 '18 at 12:50
add a comment |
Load is a very deceptive number. Take it with a grain of salt.
If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).
Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.
import os, sys
while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()
Another implementation, this one avoids wait
in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.
/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <errno.h>
#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#define ITERATIONS 50000
int maxchild = 0;
volatile int numspawned = 0;
void childhandle(
int signal)
int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;
/* Stupid task for our children to do */
void do_task(
void)
int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);
int main()
pid_t pid;
struct sigaction act;
sigset_t sigs, old;
maxchild = sysconf(_SC_NPROCESSORS_ONLN);
/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");
/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");
/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");
else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;
/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");
The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.
So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
add a comment |
Load is a very deceptive number. Take it with a grain of salt.
If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).
Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.
import os, sys
while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()
Another implementation, this one avoids wait
in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.
/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <errno.h>
#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#define ITERATIONS 50000
int maxchild = 0;
volatile int numspawned = 0;
void childhandle(
int signal)
int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;
/* Stupid task for our children to do */
void do_task(
void)
int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);
int main()
pid_t pid;
struct sigaction act;
sigset_t sigs, old;
maxchild = sysconf(_SC_NPROCESSORS_ONLN);
/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");
/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");
/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");
else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;
/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");
The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.
So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
add a comment |
Load is a very deceptive number. Take it with a grain of salt.
If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).
Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.
import os, sys
while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()
Another implementation, this one avoids wait
in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.
/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <errno.h>
#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#define ITERATIONS 50000
int maxchild = 0;
volatile int numspawned = 0;
void childhandle(
int signal)
int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;
/* Stupid task for our children to do */
void do_task(
void)
int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);
int main()
pid_t pid;
struct sigaction act;
sigset_t sigs, old;
maxchild = sysconf(_SC_NPROCESSORS_ONLN);
/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");
/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");
/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");
else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;
/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");
The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.
So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).
Load is a very deceptive number. Take it with a grain of salt.
If you spawn many tasks in very quick succession which complete very quickly, the number of processes in the run queue is too small to register the load for them (the kernel counts load once every five seconds).
Consider this example, on my host which has 8 logical cores, this python script will register a large CPU usage in top (about 85%), yet hardly any load.
import os, sys
while True:
for j in range(8):
parent = os.fork()
if not parent:
n = 0
for i in range(10000):
n += 1
sys.exit(0)
for j in range(8):
os.wait()
Another implementation, this one avoids wait
in groups of 8 (which would skew the test). Here the parent always attempts to keep the number of children at the number of active CPUs such it will be much busier than the first method and hopefully more accurate.
/* Compile with flags -O0 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <err.h>
#include <errno.h>
#include <sys/signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#define ITERATIONS 50000
int maxchild = 0;
volatile int numspawned = 0;
void childhandle(
int signal)
int stat;
/* Handle all exited children, until none are left to handle */
while (waitpid(-1, &stat, WNOHANG) > 0)
numspawned--;
/* Stupid task for our children to do */
void do_task(
void)
int i,j;
for (i=0; i < ITERATIONS; i++)
j++;
exit(0);
int main()
pid_t pid;
struct sigaction act;
sigset_t sigs, old;
maxchild = sysconf(_SC_NPROCESSORS_ONLN);
/* Setup child handler */
memset(&act, 0, sizeof(act));
act.sa_handler = childhandle;
if (sigaction(SIGCHLD, &act, NULL) < 0)
err(EXIT_FAILURE, "sigaction");
/* Defer the sigchild signal */
sigemptyset(&sigs);
sigaddset(&sigs, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &sigs, &old) < 0)
err(EXIT_FAILURE, "sigprocmask");
/* Create processes, where our maxchild value is not met */
while (1)
while (numspawned < maxchild)
pid = fork();
if (pid < 0)
err(EXIT_FAILURE, "fork");
else if (pid == 0) /* child process */
do_task();
else /* parent */
numspawned++;
/* Atomically unblocks signal, handler then picks it up, reblocks on finish */
if (sigsuspend(&old) < 0 && errno != EINTR)
err(EXIT_FAILURE, "sigsuspend");
The reason for this behaviour is the algorithm spends more time creating child processes than it does running the actual task (counting to 10000). Tasks not yet created cannot count towards the 'runnable' state, yet will take up %sys on CPU time as they are spawned.
So, the answer could really be in your case that whatever work is being done spawns large numbers of tasks in quick succession (threads, or processes).
edited Feb 12 '15 at 14:08
answered Feb 12 '15 at 13:05
Matthew IfeMatthew Ife
20.5k24663
20.5k24663
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
add a comment |
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
Thank you for the suggestion. The chart in my question shows %user time (CPU system time is excluded, we do only see a very slight increase in system time). Could many small tasks be the explanation anyways? If the load average is sampled every 5 seconds, is the CPU utilization data as given by mpstat more frequently sampled?
– K Erlandsson
Feb 12 '15 at 13:23
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
I am not familiar with how CPU sampling is done there. Never read the kernel source regarding it. In my example %usr was 70%+ and %sys was 15%.
– Matthew Ife
Feb 12 '15 at 13:30
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
Good examples !
– Xavier Lucas
Feb 12 '15 at 18:06
add a comment |
If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.
If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.
UPDATE :
It may not be clear in my original answer, so I'm clarifying now :
The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked
.
You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait
metric changing)
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
add a comment |
If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.
If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.
UPDATE :
It may not be clear in my original answer, so I'm clarifying now :
The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked
.
You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait
metric changing)
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
add a comment |
If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.
If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.
UPDATE :
It may not be clear in my original answer, so I'm clarifying now :
The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked
.
You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait
metric changing)
If the load average doesn't increase much then it just means that your hardware specs and the nature of the tasks to be processed result in a good overall throughput, avoiding them to be piled up in the task queue for some time.
If there was a contention phenomenom because for instance the average task complexity is too high or task average processing time takes too many CPU cycles, then yes, load average would increase.
UPDATE :
It may not be clear in my original answer, so I'm clarifying now :
The exact formula of load average calculation is : loadvg = tasks running + tasks waiting (for cores) + tasks blocked
.
You can definately have a good throughput and get close to a load average of 24 but without penalty on tasks processing time. On the other hand you can also have 2-4 periodic tasks not completing quickly enough, then you will see the number of task waiting (for CPU cycles) growing and you will eventually reach a high load average. Another thing that can happen is having tasks running outstanding synchronous I/O operations then blocking a core, lowering the throughput and making the waiting task queue growing (in that case you may see the iowait
metric changing)
edited Feb 12 '15 at 14:37
answered Feb 12 '15 at 13:00
Xavier LucasXavier Lucas
10.5k23245
10.5k23245
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
add a comment |
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
It is my understanding that load average also includes the tasks currently executing. That would mean we definitely can have an increase in load average without actual contention for the CPUs. Or am I mistaken/misunderstanding you?
– K Erlandsson
Feb 12 '15 at 13:24
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
@KristofferE You are completely right. The actual formula is loadavg = taks running + tasks waiting (for available cores) + tasks blocked. This mean you can have a load average of 24, no task waiting or blocked, thus having just a "full usage" or your hardware capacity without any contention. As you seemed confused about load average vs number of processes running vs CPU usage, I mainly focused my answer on explanations about how a load average can still grow with so few running processes overall. It may not be that clear indeed after re-reading it.
– Xavier Lucas
Feb 12 '15 at 14:26
add a comment |
Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.
add a comment |
Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.
add a comment |
Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.
Load average includes tasks that are blocked on disk IO, so you can easily have zero cpu utilization and a load average of 10 just by having 10 tasks all trying to read from a very slow disk. Thus it is common for a busy server to start thrashing the disk and all of the seeking causes lots of blocked tasks, driving up the load average, while cpu usage drops, since all of the tasks are blocked on the disk.
answered Feb 12 '15 at 20:34
psusipsusi
2,6771119
2,6771119
add a comment |
add a comment |
While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.
However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.
I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class MultiThreadLoad
private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
public void load()
while (true)
e.execute(new Runnable()
@Override
public void run()
sleep100Ms();
for (long i = 0; i < 5000000l; i++)
;
private void sleep100Ms()
try
Thread.sleep(100);
catch (InterruptedException e)
throw new RuntimeException(e);
);
public static void main(String[] args)
new MultiThreadLoad().load();
To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.
add a comment |
While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.
However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.
I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class MultiThreadLoad
private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
public void load()
while (true)
e.execute(new Runnable()
@Override
public void run()
sleep100Ms();
for (long i = 0; i < 5000000l; i++)
;
private void sleep100Ms()
try
Thread.sleep(100);
catch (InterruptedException e)
throw new RuntimeException(e);
);
public static void main(String[] args)
new MultiThreadLoad().load();
To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.
add a comment |
While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.
However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.
I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class MultiThreadLoad
private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
public void load()
while (true)
e.execute(new Runnable()
@Override
public void run()
sleep100Ms();
for (long i = 0; i < 5000000l; i++)
;
private void sleep100Ms()
try
Thread.sleep(100);
catch (InterruptedException e)
throw new RuntimeException(e);
);
public static void main(String[] args)
new MultiThreadLoad().load();
To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.
While Matthew Ife's answer was very helpful and led us in the right direction, it was not exactly the what caused the behavior in our case. In our case we have a multi threaded Java application that uses thread pooling, why no work is done creating the actual tasks.
However, the actual work the threads do is short lived and includes IO waits or synchornization waits. As Matthew mentions in his answer, the load average is sampled by the OS, thus short lived tasks can be missed.
I made a Java program that reproduced the behavior. The following Java class generates a CPU utilization of 28% (650% stacked) on one of our servers. While doing this, the load average is about 1.3. The key here is the sleep() inside the thread, without it the load calculation is correct.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class MultiThreadLoad
private ThreadPoolExecutor e = new ThreadPoolExecutor(200, 200, 0l, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(1000), new ThreadPoolExecutor.CallerRunsPolicy());
public void load()
while (true)
e.execute(new Runnable()
@Override
public void run()
sleep100Ms();
for (long i = 0; i < 5000000l; i++)
;
private void sleep100Ms()
try
Thread.sleep(100);
catch (InterruptedException e)
throw new RuntimeException(e);
);
public static void main(String[] args)
new MultiThreadLoad().load();
To summarize, the theory is that the threads in our applications idle a lot and then perform short-lived work, why the tasks are not correctly sampled by the load average calculation.
answered Feb 17 '15 at 8:45
K ErlandssonK Erlandsson
2751612
2751612
add a comment |
add a comment |
Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
So you have 12 cores, and for LA to increase significantly the number of processes must be really high.
Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp
, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp
the total amount of CPU load is 1200%.
If my assumptions are correct, then the CPU usage didn't increased significantly.
Thus, LA didn't increased significantly.
The cpu usage is taken from mpstat, theall
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.
– K Erlandsson
Feb 12 '15 at 12:31
add a comment |
Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
So you have 12 cores, and for LA to increase significantly the number of processes must be really high.
Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp
, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp
the total amount of CPU load is 1200%.
If my assumptions are correct, then the CPU usage didn't increased significantly.
Thus, LA didn't increased significantly.
The cpu usage is taken from mpstat, theall
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.
– K Erlandsson
Feb 12 '15 at 12:31
add a comment |
Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
So you have 12 cores, and for LA to increase significantly the number of processes must be really high.
Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp
, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp
the total amount of CPU load is 1200%.
If my assumptions are correct, then the CPU usage didn't increased significantly.
Thus, LA didn't increased significantly.
Load average is average number of processes in the CPU queue. It is specific for each system, you cannot say that one LA is generically high on all systems, and another is low.
So you have 12 cores, and for LA to increase significantly the number of processes must be really high.
Another question is what is meant by the "CPU Usage" graph. If it's taken from SNMP, like it should be, and your SNMP implementation is net-snmp
, then in just stacks CPU-load from each of your 12 CPU. So for net-snmp
the total amount of CPU load is 1200%.
If my assumptions are correct, then the CPU usage didn't increased significantly.
Thus, LA didn't increased significantly.
answered Feb 12 '15 at 12:21
drookiedrookie
6,13711219
6,13711219
The cpu usage is taken from mpstat, theall
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.
– K Erlandsson
Feb 12 '15 at 12:31
add a comment |
The cpu usage is taken from mpstat, theall
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.
– K Erlandsson
Feb 12 '15 at 12:31
The cpu usage is taken from mpstat, the
all
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.– K Erlandsson
Feb 12 '15 at 12:31
The cpu usage is taken from mpstat, the
all
row. I am fairly certain it is an average across all CPUs, it is not stacked. For example, when the problem occurs, top shows 2000% CPU usage for one process. That is stacked usage.– K Erlandsson
Feb 12 '15 at 12:31
add a comment |
The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.
(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).
if our CPUs are busy 75% of the time, shouldn't we see higher load average?
Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.
update
One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, theall
row still shows the average per CPU. I will clarify the question.
– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
add a comment |
The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.
(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).
if our CPUs are busy 75% of the time, shouldn't we see higher load average?
Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.
update
One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, theall
row still shows the average per CPU. I will clarify the question.
– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
add a comment |
The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.
(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).
if our CPUs are busy 75% of the time, shouldn't we see higher load average?
Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.
update
One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.
The scenario here is not particularly unexpected although it is a little unusual. What Xavier touches on, but does not develop much, is that although Linux (by default) and most flavours of Unix implement pre-emptive multi-tasking, on a healthy machine, tasks will rarely be pre-empted. Each task is alotted a time slice for occupying the CPU, it is only pre-empted if it exceeds this time and there are other tasks waiting to run (note that load reports the average number of processes both in the CPU and waiting to run). Most of the time, a process will yield rather than being interrupted.
(in general you only need to worry about load when it gets close the number of CPUs - i.e. when the scheduler starts pre-empting tasks).
if our CPUs are busy 75% of the time, shouldn't we see higher load average?
Its all about the pattern of activity, clearly increased utilization of the CPU by some tasks (most likely a small mintority) was not having an adverse effect on the processing of other tasks. If you could isolate the transactions being processed, I would expect you would see a new group emerging during the slowdown, while the extant task set was not affected.
update
One common scenario where high CPU can occur without a big increase in load is where a task triggers one (or a sequence) of other tasks, e.g. on receipt of a network request, the handler routes the request to a seperate thread, the seperate thread then makes some asynchronous calls to other processes.... the sampling of the runqueue causes the load to reported lower than it really is - but it does not rise linearly with CPU usage - the chain of tasks triggerred would not have been runnable without the initial event, and because they occur (more or less) sequentially the run queue is not inflated.
edited Feb 12 '15 at 14:55
answered Feb 12 '15 at 13:55
symcbeansymcbean
18.7k12339
18.7k12339
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, theall
row still shows the average per CPU. I will clarify the question.
– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
add a comment |
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, theall
row still shows the average per CPU. I will clarify the question.
– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
The OP originally provided indications that the aggregate CPU% was "2000%" suggesting there are many tasks using up CPU, rather than just 1 busy process. If it was a consistent 2000% for a minute you'd normally anticipate the load be 20-ish.
– Matthew Ife
Feb 12 '15 at 14:10
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
...in a comment, not in the question, and he's not very sure about that. In the absence of the 'ALL' option, mpstat reports the total % usage not the average. But that doesn't change the answer - it's about the pattern of activity.
– symcbean
Feb 12 '15 at 14:40
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the
all
row still shows the average per CPU. I will clarify the question.– K Erlandsson
Feb 12 '15 at 14:43
I'm 100% positive that the CPU util we see in the chart is the "average per CPU". Mpstat is run without ALL, but that only leaves out the per-CPU info, the
all
row still shows the average per CPU. I will clarify the question.– K Erlandsson
Feb 12 '15 at 14:43
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
Could you please elaborate yoru last section a bit? I fail to grasp what you mean, while the part of my question you cited is the part I have most trouble understanding.
– K Erlandsson
Feb 12 '15 at 14:44
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f667078%2fhigh-cpu-utilization-but-low-load-average%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is the monitoring system showing normalized CPU load (load / #CPUs)? Regular Linux CPU load is hard to compare across systems with different core/cpu counts so some tools use a normalized CPU load instead.
– Brian
Feb 12 '15 at 12:08
Do you mean dividing each data point with the number of CPUs? I.e. loadavg/24 in our case? I can easily create such a chart from the data if that helps.
– K Erlandsson
Feb 12 '15 at 12:14
I was suggesting your chart may already be showing that.
– Brian
Feb 12 '15 at 12:51
Ah, sorry for misunderstanding you. It would have been a nice explanation, but unfortunately it is the system-wide load average that is shown. I just triple checked.
– K Erlandsson
Feb 12 '15 at 12:54