Re: move-accounting-function-calls-out-of-critical-vm-code-paths.patch

From: Andrew Morton <>
Date: 2005-02-04 09:09:04
Christoph Lameter <> wrote:
> As requested by Andrew:
> In the 2.6.11 development cycle function calls have been added to lots
> of hot vm paths to do accounting. I think these should not go into the
> final 2.6.1 release because these statistics can be collected in a different
> way that does not require the updating of counters from frequently used
> vm code paths and is consistent with the methods use elsewhere in the kernel
> to obtain statistics.
> These function calls are
> acct_update_integrals	-> Account for processes based on stime changes
> update_mem_hiwater	-> takes rss and total_vm hiwater marks.

Has any performance testing been done?

> acct_update_integrals is only useful to call if stime changes otherwise
> it will simply return. It is therefore best to relocate the function call
> to acct_update_integral into the function that updates stime which is
> account_system_time and remove it from the vm code paths.

But that changes (breaks) the semantics significantly.  A task will now
only have its BSD accounting fields updated when it happens to be
interrupted by the timer.  Some tasks:

	for ( ; ; ) {
		nanosleep(2 milliseconds);
		do_stuff_for(0.5 milliseconds);

will see their BSD accounting fields remaining stuck firmly at zero.

I think?

> update_mem_hiwater finds the rss hiwater mark. RSS limits are checked in
> account_system_time().

Linux doesn't check rss limits anywhere.  We check CPU usage in

> Thus is makes most sense to also move the function
> call to update_mem_hiwater there. Otherwise a process may have a higher
> rss hiwater mark than allowed by rss limits!
> This means that the rss limit is not always updated if rss is increased
> and thus not as accurate. But the benefit is that the rss checks do no
> pollute the vm paths and that it is consistent with the rss limit check.

again, updating the mm highwater metric at interrupt time means that the
metric can be wrong by arbitrary amounts, depending upon on synchronisation
between the tasks's execution and the timer interupt.

Most of this could be fixed up by updating these counters at schedule()
time as well, although that would become somewhat inaccurate if we later
decide to implement rss enforcement at pagefault time.

To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to
More majordomo info at
Received on Thu Feb 3 17:04:42 2005

This archive was generated by hypermail 2.1.8 : 2005-08-02 09:20:35 EST