Clinically Awesome: Load Average Explained (Sort of)

One day I was explaining load average over our internal instant messaging system to one of our newer tech support people. I felt that it might be worth sharing.

(02:28:37 PM) Technician: wat causes the load average on a sharedserver to be so high

(02:28:40 PM) Technician: ?

(02:28:48 PM) Technician: [root@sharedserver4121 root]# uptime

13:25:08 up 96 days, 12:01, 2 users, load average: 187.21, 185.44, 181.12

(02:28:59 PM) Jason: What is the meaning of load average?

(02:30:19 PM) Technician: The average load that the processor is under

(02:30:20 PM) Technician: ?

(02:30:44 PM) Jason: Okay, what does the number ‘187.21’ mean?

(02:31:21 PM) Technician: the present load or the load 5 min ago

(02:31:21 PM) Jason: asside from 100 + 80 + 7 + 2/10 + 1/100

(02:31:35 PM) Jason: but what is load? What is the unit of load?

(02:32:34 PM) Technician: a quantity that can be processed or transported at one time

(02:32:36 PM) Technician: ?

(02:34:25 PM) Jason: Good… you understand what a process is?

(02:36:51 PM) Technician: A process is a program that is running on the computer?

(02:36:53 PM) Technician: ?

(02:36:59 PM) Technician: :-/

(02:37:10 PM) Jason: ‘an instance of a program’, to be precise

(02:37:20 PM) Jason: what do you know about process status?

(02:39:15 PM) Technician: it tells you the status of the process that is currently running on the computer

(02:39:32 PM) Jason: what are the possible statuses?

(02:39:40 PM) Jason: or some of them, at least

(02:40:23 PM) Technician: cpu, memory uses

(02:40:27 PM) Jason: No

(02:40:52 PM) Jason: Do you know the difference between multiprocessing and multitasking?

(02:41:06 PM) Jason: (I’m not trying to give you a hard time, I just need to know where to start)

(02:41:50 PM) Technician: I know

(02:43:31 PM) Technician: multiprocessing is using more than one cpu on one computer and multitasking is sharings

(02:43:32 PM) Technician: ?

(02:43:40 PM) Jason: Correct

(02:44:45 PM) Jason: With multiprocessing you have multiple processors and can do more than one thing at a time. With multitasking (time sharing) the system simulates doing more than one thing at a time by switching between processes (or threads) very quickly.

(02:45:20 PM) Technician: ic

(02:45:25 PM) Jason: This is all managed by the process scheduler… part of the kernel

(02:50:58 PM) Jason: The scheduler keeps track how long the current process has been running, which tasks are eligible to be run, each processes priority, etc

(02:51:50 PM) Technician: so the sharedserver is using multitasking?

(02:52:04 PM) Jason: correct. sharedserveres with hyperthreading do some multiprocessing

(02:52:15 PM) Technician: ic

(02:52:33 PM) Jason: The scheduler is very concerned with a process’s status. The status is shown by the column in ‘ps’ that says S, R, D, etc

(02:52:55 PM) Technician: because is not jus his stuff (site & email) is on it..more than one client is using it as well

(02:53:17 PM) Technician: my English isn;t all that great

(02:53:32 PM) Jason: more than that… there’s apache running, there’s ftp running, there’s sendmail running, there’s system housekeeping tasks running, etc

(02:54:00 PM) Jason: So, the status is really whether or not a process can run and if not, why not.

(02:54:08 PM) Technician: a sh*t load of stuff…which can cause the load to be pretty high

(02:54:32 PM) Jason: a shit load of stuff doesn’t necessarily mean a high load though

(02:54:52 PM) Technician: so many thing is waiting to be process

(02:54:56 PM) Technician: in other words

(02:55:05 PM) Jason: Well, it’s like this

(02:55:38 PM) Jason: Most processes that you see will be in the S status, which means ‘sleeping’. When processes don’t need to do anything they go to sleep. When they are needed they get woken up

(02:56:05 PM) Jason: For example there’s cron which run processes at certain times. It doesn’t constantly run, checking the time…

(02:56:37 PM) Jason: What it does is tells the kernel to wake it up at such and such time, then it goes to sleep

(02:56:49 PM) Technician: ic

(03:00:56 PM) Jason: When the appropriate time comes around, the kernel wakes up the process and it does whatever it needs to do, schedules a wake up call, and goes back to sleep

(03:02:19 PM) Jason: There’s lots of reasons for a process to go to sleep. Perhaps it’s waiting on you to press a key or it’s waiting from someone to connect

(03:02:50 PM) Jason: When the process actually needs to do stuff, it’s Runnable ‘R’.

(03:03:20 PM) Jason: Runnable is not the same as Running. Runnable means that the process wants to use the processor but it doesn’t necessarily get to

(03:04:13 PM) Jason: If there’s only one processor and multiple runnable processes, they have to share. The scheduler takes care of the sharing. Whoever isn’t currently using the processor has to wait.

(03:05:14 PM) Jason: Another status is ‘D’ which means (for some reason) waiting on IO. That process has asked for some data and it hasn’t arrived yet or it’s asked for data to be written and the kernel hasn’t confirmed that the data has been written.

(03:05:51 PM) Jason: The D status is similar to runnable in that it has stuff to do, but in this case it’s being forced to wait on something other than the processor

(03:06:19 PM) Jason: …which finally brings us to load average

(03:06:28 PM) Jason: still with me? :)

(03:06:36 PM) Technician: you should write a book

(03:06:44 PM) Technician: :-D..but yeah

(03:06:46 PM) Technician: go on

(03:06:47 PM) Jason: I’ve thought about it, but there’s already tons of books.

(03:07:45 PM) Jason: Many times a second (usually a hundred, but it’s configurable) the scheduler is woken up. It takes a survey of the status of various processes to find out who wants processor time

(03:08:31 PM) Jason: Every time it does, it makes a note of how many processes want processor time. The average of this is the load average

(03:08:55 PM) Jason: So, if my load average is 2.0, on average there have been two processes that want processor time

(03:09:07 PM) Jason: If I only have one processor, this means one running and one waiting

(03:09:16 PM) Technician: ic

(03:09:17 PM) Jason: If I have two processors, this means they each get to run

(03:09:39 PM) Jason: If I have four processors, this means two processors have (on average) been idle and two have been utilized

(03:09:54 PM) Technician: so in this case we have at lease three right…… 0.24, 0.32, 0.34

(03:10:06 PM) Technician: oh„,

(03:10:07 PM) Technician: okay

(03:10:08 PM) Jason: those are the averages over 5, 10, and 15 minutes

(03:10:12 PM) Technician: ic

(03:10:17 PM) Technician: coming back now

(03:10:36 PM) Jason: Although it’s a weighted average… older numbers have less weight. That part’s not so important though

(03:10:37 PM) Technician: okay

(03:11:18 PM) Jason: So, what does a load of 184 mean?

(03:12:26 PM) Technician: there are 184 processors and 182 is being idle and two is being utilized

(03:12:52 PM) Jason: do you think that server you were asking about had 184 processors?

(03:13:01 PM) Technician: no

(03:13:11 PM) Technician: that sounds like to much

(03:13:35 PM) Jason: It wouldn’t be for a supercomputer but that’s not what we run for webservers ;)

(03:13:54 PM) Jason: The load average is related to processes not processors

(03:14:15 PM) Jason: It’s only a person interpretting the number that takes processors into account

(03:14:33 PM) Jason: So try again about what a load of 184 means

(03:15:34 PM) Jason: actually, not what it means, what it indicates. What is it a measure of?

(03:16:03 PM) Technician: 184 processes waiting

(03:16:03 PM) Technician: ?

(03:16:57 PM) Jason: Very, very close

(03:17:11 PM) Jason: It means 184 processes have wanted to run

(03:17:32 PM) Jason: Subtract the number of processors you have and that’s usually how many you have waiting

Clinically Awesome

Sunday, June 3, 2007

Load Average Explained (Sort of)

No comments:

Post a Comment