2013-06-10 17:28:44 UTC
attached is my first take on the problem we have discussed few weeks ago -
limiting amount of deferred messages in the active queue and limiting
amount of delivery agents used by presumably "slow" deferred messages.
The patch contains several incremental parts which show how I developed
this and which make it more readable at the same time.
The first half of the patch implements the active queue limit. Few
- The qmgr_loop() now always round robins the queues. The original version
stopped doing that when active queue was full and incoming mail kept
flowing in. I understand that this was done to prevent the deferred queue
from dominating the active queue, however the downside was that it could
also entirely starve the deferred queue indefinitely. IMO the new
mechanism prevents the active queue from being stalled in much better
way, while remaining fair at the same time.
- Coincidentally, the previous change can now results in more in-flow back
pressure being applied when the active queue becomes full, as deferred
queue doesn't get ignored entirely. So this seems like an improvement, not
a regression which Viktor was afraid of.
- The qmgr_deferred_message_limit is internally used to limit the number
of deferred messages in the active queue. However, I later realized that
from user's point of view, it is perhaps easier if they can instead
specify how much of the active queue has to be left available for the
incoming queue. This has the advantage that they don't have to adjust this
when they increase the active queue size, so the default works better,
too. But then I am not entirely sure how (or if at all) the name of the
config variable shouldn't somehow change, too...
The second half implements the delivery agent limit. Few comments:
- Skipping the slow jobs in qmgr_job_entry_select() turned out not to be
that difficult - the tricky part was of course verifying all the
implications of this change and coming up with relevant adjustments. The
trickiest part was in qmgr_transport_select(), which shall decide whether
transport has some recipient entries ready for delivery. Rather than
duplicating all the window counting logic for slow jobs, I have decided to
use the fact that qmgr can change its mind after contacting the delivery
agent. This is something which already happens now from time to time, so
it seemed like a reasonable solution for a situation which is not time
- The fact that qmgr doesn't know the transport limit makes some things a
bit difficult. I could not use the "how much has to be left unused"
approach for the slow delivery agent limit, for example, so people have to
adjust this limit when increasing the transport or process limit. I have
also added some warning to make sure people don't set this limit too low,
but it can only trigger after the delivery agents start getting maxed out.
It doesn't detect the case when this limit is set too high either.
I don't know how much of an issue all this is, but it might be worth
considering to somehow let master to pass these limits to qmgr on startup.
And few overall comments:
- I have only added the basic docs to qmgr(8). I didn't regenerate
the manpages, didn't touch the postconf docs, nor any other docs like
postfix tuning. This all can IMO wait until the names and meaning of
the new variables gets finalized.
- Regardless of how much I would like to do that, I didn't have time nor
suitable testing environment to actually test this patch. I did my best to
make sure it is correct, though, and I hope someone here will be kind
enough to put it to test.