Post by Patrik Rak Post by Wietse Venema Post by Viktor Dukhovni
The reasonable response to latency spikes is creating concurrency
By design, Postfix MUST be able to run in a fixed resource budget.
Your on-demand concurrency spikes break this principle and will
result in unexpected resource exhaustion.
I'd second Wietse on this one.
And yet you're missing the point.
Post by Patrik Rak
If you throw in more resources for everyone, the bad guys are gonna
claim it sooner or later. You have to make sure you give it only to
the good guys, which is the same as giving less to the bad guys in
the first place. No need to throw in yet more additional resources
We don't know who the "good guys" are and who the "bad guys" are.
- A deferred message may simply be greylisted and may deserve
timely delivery on its 2nd or 3rd (if the second was a bit too
early) delivery attempt.
- A small burst of fresh messages may be a pile of poop destined
to dead domains, and may immediately clog the queue for 30-300
Post by Patrik Rak
And that's also why it is important to classify ahead of time, as
once you give something away, it's hard to take it back.
There is no "giving away" to maintain throughput, high latency
tasks warrant higher concurrency, such concurrency is cheap since
the delivery agents spend most of their time just sitting there
By *moving* the process count from the fast column to the slow
column in real-time (based on actual delivery latency not some
heuristic prediction), we free-up precious slots for fast deliveries,
which are fewer in number. Nothing I'm proposing creates less
opportunity for delivery of new mail, rather I'm proposing dynamic
(up to a limit) higher concurrency that soaks up a bounded amount
of high latency traffic (ideally all of it most of the time).
To better understand the factors that impract the design we need
to distinguish between burst pressure and steady-state pressure.
When a burst of bad new mail arrives, your proposal takes it through
the fast path which gets congested "once" (by each message anyway,
but if the burst is large enough, the effect can last quite some
time). If the mail is simply slow to deliver, but actually leaves
the queue, that's all. Otherwise the burst gets deferred, and now
gets the slow path, which does not further congest delivery of new
mail, but presumably makes multiple trips through the deferred queue,
causing congestion there each time, amplified if you allocate fewer
processes to the slow than the fast path (I would strongly discourage
In any case the fast/slow path fails to completely deal with bursts.
So lets consider steady-state. Suppose bad mail trickles in as a
fraction "0 < b < 1" of the total new mail stream, at a rate that
does lead to enough congested fast path processes just from new
mail. What happens after that?
Well in steady-state, each initially deferred message (which we
for worst-case assume continues to tempfail until it expires) gets
retried N times, where N grows with the maximum queue lifetime and
shrinks with the maximal backoff time (details later). Therefore,
the rate at which bad messages enter the active queue from the
deferred queue is approximately N * b * new_mail_input_rate.
When is that a problem? When, N * b >> 1. Because now a small
trickle of bad new mail becomes a steady stream of retried bad
mail whose volume is "N * b" higher. So what can we do to
reduce the impact?
I am proposing raising concurrency for just the bad mail, without
subtracting concurrency for the good mail, thereby avoiding collateral
damage to innocent bystanders (greylisted mail for example). This
also deals with the initial burst (provided the higher concurrency
for slow mail is high enough to absorb the most common bursts and
low enough to not run out of RAM or kernel resources). This does
no harm! It can only help.
You're proposing a separate transport for previously deferred mail,
this can help but also hurt if the concurrency for the slow path
is lower than for the fast path, otherwise it is just a variant of
my proposal, in which we guess who's good and who's bad in advance,
and avoid spillover from the bad processes into the good when the
bad limit is reached. In both cases total output concurrency should
rise. Each performs better in some cases and worse in others.
The two are composable, we could have a dedicated transport for
previously deferred mail with a separate process limit for slow
vs. fast mail if we really wanted to get fancy. We could even
throw in Wietse's prescreen for DNS prefetching, making a further
dent in latency. All three would be a lot of work of course.
So what have we not looked at yet? We've not considered trying to
reduce "N * b", which amounts to reducing "N" since "b" is outside
our control to some degree (though if you can accept less junk,
that's by far the best place to solve the problem, e.g. validate
the destination domain interactively while the user is submitting
the form for example).
So what controls "N"? With exponential backoff we rapidly reach
the maximum backoff time in a small number of retries, especially
because the this backoff time is actually a lower bound in the
spacing between deliveries, actual deliveries are typically spaced
wider and so the time grows faster than a simpler power of two.
Therefore, to good approximation we can assume that the retry
count for steady-state bad mail is queue_lifetime/maximal_backoff.
Let's plug-in the defaults:
$ echo "1k 86400 5 * 4000 / p" | dc
That's ~100 retries in 5 days. This concentrates bad mail when
the bad is > 1% of the total. Suppose a site with unavoidable
garbage entering the queue has users that are happier to find out
that their mail did not get to its recipient sooner rather than
perhaps wating for a full 5 days adjusts the queue lifetime down
to 2 days (I did that at Morgan Stanley, where this worked well
for the user community, RFCs to the contrary notwithstanding).
Then we get:
$ echo "1k 86400 2 * 4000 / p" | dc
Now N drops to ~40, which could make the differennce between deferred
mail concentrating initial latency spikes to diluting them (at the
2.5% bad mail mark). What else can do? Clearly raise the maximal
backoff time. How does that help? Consider raising the maximal
backoff time from 4000s to 14400s (4 hours). Now we get:
$ echo "1k 86400 2 * 14400 / p" | dc
Now N is ~12, and we've won almost a factor of 10 from the default
settings. Unless the bad mail is ~8% of the total input there is
no concentration and we don't need to discriminate against deferred
Is it reasonable to push the max backoff time this high? I think
so, by the time we tried;
5m (default first retry)
80m (20% higher than the curren ceiling of 4000s)
the message has been in the queue for 155 minutes (or 3.5 hours)
and has been tried 6 times. The next retry would normally be about
66 minutes later, but I'd delay it to 160 minutes, so such a message
would leave (if that is its fate however unlikely) after 6 hours instead
of 5. Is that sufficiently better? Otherwise, with the message already
6 hours late, do we have to try every hour or so? Or is every 4 hours
enough? I think it is.
So the simplest improvement we can make it just tune the backoff
and queue lifetime timers. If we then add process slots for blocked
messages (another factor of 5 in many cases) we are looking at raw
sewage (40% bad) entering the queue before the deferred queue is
any different from fresh mail.
Since we've managed 12 years with few complaints about this issue,
I think that the timer adjustment is the easiest first step. Users
can tune their timers today with no new code.