Discussion:
Virtual SMTP session caching for TLS?
(too old to reply)
Wietse Venema
2007-08-29 11:05:15 UTC
Permalink
I would like to avoid doubling the query/update traffic. It should
be possible to use the same server for good news (use this socket)
and bad news (avoid this IP address).
A reasonable requirement.
Yes, so the key question is whether it makes more sense to use a cache
to sort hosts that recently worked to the front of the list, or to sort
hosts that fail to work to the back of the list. One can of course employ
both strategies.
The SMTP client sorts IP addresses by MX preference, and then walks
down that list. If connection caching is "desirable" and the
logical->socket mapping is successful, then of course that socket
must be used by the SMTP client.
Otherwise the cache is queried for each IP address in order of
MX preference. If the response for a given IP address is "stay
away" then simply skip the host (but be sure to log why).
I don't want to skip hosts that are believed dead, because this may
effectively throttle the destination, and the cache will have negative
entries after a single failure, so this control would be too sensitive
IMHO.
It's ignored only while the cache entry exists, but limiting the
cache life time of descriptor-less entries is a little tricky.
If we look closely at the positive cache, it moves a possibly worse than
best preference MX host to the front of the list (tried before looping
over remaining addresses). So we can think of the positive cache as
a re-ordering of the MX address list, with the extra feature of using
an existing connection.
Actually, it moves forward the "best known to work" host(s). If I
had implemented positive caching WITHOUT file descriptors, TLS
would have benefited as well.

It still is not too late to support descriptor-less POSITIVE caching,
so that we can avoid the complications of negative caching. Limiting
the reuse of positive cache entries can be tricky: we still want
the MTA to discover the other working hosts eventually. Perhaps
tossing a radndom coin (reuse or not) will be adequate.
Viewed through those particular glasses, a negative cache can also
reorder the MX host list (never sending to a host with a worse or equal
preference to the Postfix system itself, that trimming happens first).
Hosts believed dead would be tried after all other hosts have been tried.
If a non-empty list of fallback relay nexthop destinations is provided,
we can skip dead hosts for all but the last fallback nexthop, so that
destinations with all MX hosts dead proceed immediately to the fallback
relay in a timely fashion (until at least one of the negative cache
entries expires).
We still need to explore the possibilities of positive and negative
caching more fully.

More after tea time.

Wietse
In the cache server, the tree-based data structures for positive
caching are over-kill for negative caching. Instead, a CTABLE
indexed by client IP address should be sufficient. The CTABLE
already removes old entries when the cache fills up. And instead
of one expensive expiration event timer per negative cache entry,
just drop too old negative entries when they show up as the answer
to a negative cache query.
If this moves to an implementation, I'll take a look at the table
types...
When you have a moment to reflect, I am curious whether you think this use
case (high volume TLS to partially live destinations) is best solved with
a negative cached (as discussed above) or a positive cache with an ignored
descriptor (forcing a new connection to a previously good endpoint).
The negative cache is considerably more work, but could prove beneficial
for more than just TLS, because both positive and negative caching
can make separate contributions to improved throughput to hostile
destinations. Does the likely benefit justify the increased implementation
cost?
--
Viktor.
Wietse Venema
2007-08-29 21:06:00 UTC
Permalink
Post by Wietse Venema
It's ignored only while the cache entry exists, but limiting the
cache life time of descriptor-less entries is a little tricky.
If we look closely at the positive cache, it moves a possibly worse than
best preference MX host to the front of the list (tried before looping
over remaining addresses). So we can think of the positive cache as
a re-ordering of the MX address list, with the extra feature of using
an existing connection.
Actually, it moves forward the "best known to work" host(s). If I
had implemented positive caching WITHOUT file descriptors, TLS
would have benefited as well.
It still is not too late to support descriptor-less POSITIVE caching,
so that we can avoid the complications of negative caching. Limiting
the reuse of positive cache entries can be tricky: we still want
the MTA to discover the other working hosts eventually. Perhaps
tossing a radndom coin (reuse or not) will be adequate.
It's not tricky if we use similar policies with descriptor-full
and descriptor-less positive connection caching:

- Increment a reference counter when an entry is stored,

- Decrement a reference counter when an entry is retrieved,

- Discard an entry when its reference count reaches zero.

- Discard an entry that hasn't been retrieved for some time.

- Discard an entry that has been retrieved too many times.

This encourages both TLS and non-TLS SMTP clients to discover
working servers during the "ramp up" phase before the concurrency
is maxed out. After that phase, the TLS SMTP clients will keep
reusing IP addresses until their reuse limits are reached, while
the non-TLS SMTP clients will keep reusing sockets until their
reuse limits are reached.

This changes the semantics of the reuse count as logged to the
maillog file, but it is still useful information.

The changes are how to represent a descriptor-less entry in the
cache (a special file number or null pointer), in the scache protocol
(an indicator that no file descriptor will be passed), and what a
descriptor-less result looks like from the client API (a special
file number).

Descriptor-less positive/negative connection caching won't help
when all the servers are behind the same IP address.

Wietse
Victor Duchovni
2007-08-29 21:43:57 UTC
Permalink
Post by Wietse Venema
Post by Wietse Venema
It still is not too late to support descriptor-less POSITIVE caching,
so that we can avoid the complications of negative caching. Limiting
the reuse of positive cache entries can be tricky: we still want
the MTA to discover the other working hosts eventually. Perhaps
tossing a radndom coin (reuse or not) will be adequate.
It's not tricky if we use similar policies with descriptor-full
- Increment a reference counter when an entry is stored,
- Decrement a reference counter when an entry is retrieved,
- Discard an entry when its reference count reaches zero.
- Discard an entry that hasn't been retrieved for some time.
- Discard an entry that has been retrieved too many times.
I had exactly this strategy in mind, require N stores for N fetches.
Post by Wietse Venema
This encourages both TLS and non-TLS SMTP clients to discover
working servers during the "ramp up" phase before the concurrency
is maxed out. After that phase, the TLS SMTP clients will keep
reusing IP addresses until their reuse limits are reached, while
the non-TLS SMTP clients will keep reusing sockets until their
reuse limits are reached.
This changes the semantics of the reuse count as logged to the
maillog file, but it is still useful information.
Yes, this is why I was asking about logging earlier in the thread, I am
tempted to say that virtual reuse is not actually "reuse" (in the sense
of talking over an already established channel). Since we have expire
times and not reuse count limits, it is OK to leave the re-use counter
of descriptor-less positive cache entries at 0.
Post by Wietse Venema
The changes are how to represent a descriptor-less entry in the
cache (a special file number or null pointer), in the scache protocol
(an indicator that no file descriptor will be passed), and what a
descriptor-less result looks like from the client API (a special
file number).
Descriptor-less positive/negative connection caching won't help
when all the servers are behind the same IP address.
Yes, if one has a pool of MX hosts behind a load balancer, and connection
acquisition is unreliable, but once connected, re-use is viable, only
true connection caching helps. We don't get that for TLS unless we have
a pool of multi-servers mediating TLS communication, and I don't want
to go there quite yet (or ever).

My focus is on more traditional multi MX host environments (most commonly
inbound MTA pools behind a perimeter gateway) where single MTA outages can
cause severe congestion without connection caching.

When real connection caching is available, negative caching may not help
much, because connections to dead hosts are slow and so the information
that a host is dead takes a long time to be discovered, so there will
be multiple threads (processes) discovering the same fact in parallel,
the negative caching may not help much unless its lifetime substantially
exceeds the connection re-use expire time (300s), and this is likely
too long.

If negative caching is good only when descriptor-full positive caching
is not an option, then it seems the descriptor-less positive caching
is the more sensible approach...

The implementation issues are:

- Use existing cache structure or consider moving to CTABLE?

- Store a /dev/null descriptor or add support for truly descriptor-less
cache entries?

- Protocol revisions?

- What does the reuse counter mean, and should it be incremented for
connections are remade based on a descriptor-less positive cache hit.
--
Viktor.
Wietse Venema
2007-08-29 23:14:29 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
Post by Wietse Venema
It still is not too late to support descriptor-less POSITIVE caching,
so that we can avoid the complications of negative caching. Limiting
the reuse of positive cache entries can be tricky: we still want
the MTA to discover the other working hosts eventually. Perhaps
tossing a radndom coin (reuse or not) will be adequate.
It's not tricky if we use similar policies with descriptor-full
- Increment a reference counter when an entry is stored,
- Decrement a reference counter when an entry is retrieved,
- Discard an entry when its reference count reaches zero.
- Discard an entry that hasn't been retrieved for some time.
- Discard an entry that has been retrieved too many times.
I had exactly this strategy in mind, require N stores for N fetches.
Post by Wietse Venema
This encourages both TLS and non-TLS SMTP clients to discover
working servers during the "ramp up" phase before the concurrency
is maxed out. After that phase, the TLS SMTP clients will keep
reusing IP addresses until their reuse limits are reached, while
the non-TLS SMTP clients will keep reusing sockets until their
reuse limits are reached.
This changes the semantics of the reuse count as logged to the
maillog file, but it is still useful information.
Yes, this is why I was asking about logging earlier in the thread, I am
tempted to say that virtual reuse is not actually "reuse" (in the sense
of talking over an already established channel). Since we have expire
times and not reuse count limits, it is OK to leave the re-use counter
of descriptor-less positive cache entries at 0.
Post by Wietse Venema
The changes are how to represent a descriptor-less entry in the
cache (a special file number or null pointer), in the scache protocol
(an indicator that no file descriptor will be passed), and what a
descriptor-less result looks like from the client API (a special
file number).
Descriptor-less positive/negative connection caching won't help
when all the servers are behind the same IP address.
Yes, if one has a pool of MX hosts behind a load balancer, and connection
acquisition is unreliable, but once connected, re-use is viable, only
true connection caching helps. We don't get that for TLS unless we have
a pool of multi-servers mediating TLS communication, and I don't want
to go there quite yet (or ever).
Cached plain-text sockets to single-server TLS proxies would do.
But it may be less work to add frozen session support to OpenSSL.
Post by Victor Duchovni
My focus is on more traditional multi MX host environments (most commonly
inbound MTA pools behind a perimeter gateway) where single MTA outages can
cause severe congestion without connection caching.
When real connection caching is available, negative caching may not help
much, because connections to dead hosts are slow and so the information
that a host is dead takes a long time to be discovered, so there will
be multiple threads (processes) discovering the same fact in parallel,
the negative caching may not help much unless its lifetime substantially
exceeds the connection re-use expire time (300s), and this is likely
too long.
If negative caching is good only when descriptor-full positive caching
is not an option, then it seems the descriptor-less positive caching
is the more sensible approach...
- Use existing cache structure or consider moving to CTABLE?
It is't broken, so no need to fix it now.
Post by Victor Duchovni
- Store a /dev/null descriptor or add support for truly descriptor-less
cache entries?
The latter would be cleaner.
Post by Victor Duchovni
- Protocol revisions?
- What does the reuse counter mean, and should it be incremented for
connections are remade based on a descriptor-less positive cache hit.
Not logging the reuse count is actually more work. BTW it's not
obvious what reuse count the scache server would send to the scache
client, anyway.

Wietse
Victor Duchovni
2007-08-30 00:17:38 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
- What does the reuse counter mean, and should it be incremented for
connections are remade based on a descriptor-less positive cache hit.
Not logging the reuse count is actually more work. BTW it's not
obvious what reuse count the scache server would send to the scache
client, anyway.
The scache server is completely unaware of the structure of the endpoint
props field, the reuse counter is incremented in the SMTP client,
and it can simply set leave it at zero when remaking connections for
descriptor-less positive cache hits.

Sounds like we have most of a strategy. It will take me a bit of
experimentation to arrive at the right implementation, so I don't expect
to have it ready in under a few weeks of part-time tinkering. If anyone
has further design suggestions in the mean-time, I am all ears.

One last interface question: does this feature (caching descriptor-less
TLS endpoints) need a separate configuration parameter, or should
it just be on when descriptor-full connection caching is on? If
a separate boolean parameter is used, what do we call it and is
"$smtp_connection_cache_on_demand" a sensible default?

Ditto, for destinations where connection caching is always on
via $smtp_connection_cache_destinations.
--
Viktor.
Wietse Venema
2007-08-30 00:31:41 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
Post by Victor Duchovni
- What does the reuse counter mean, and should it be incremented for
connections are remade based on a descriptor-less positive cache hit.
Not logging the reuse count is actually more work. BTW it's not
obvious what reuse count the scache server would send to the scache
client, anyway.
The scache server is completely unaware of the structure of the endpoint
props field, the reuse counter is incremented in the SMTP client,
and it can simply set leave it at zero when remaking connections for
descriptor-less positive cache hits.
Correct. The less the scache server knows the better; the SMTP
client enforces the reuse limit.
Post by Victor Duchovni
Sounds like we have most of a strategy. It will take me a bit of
experimentation to arrive at the right implementation, so I don't expect
to have it ready in under a few weeks of part-time tinkering. If anyone
has further design suggestions in the mean-time, I am all ears.
One last interface question: does this feature (caching descriptor-less
TLS endpoints) need a separate configuration parameter, or should
it just be on when descriptor-full connection caching is on? If
a separate boolean parameter is used, what do we call it and is
"$smtp_connection_cache_on_demand" a sensible default?
It should probably be on by default. As long as it's triggered by
the queue manager then we won't saturate the scache server with
boatloads of junk that will not be reused. Perhaps it should have
its own main.cf parameter names, parallel to the existing ones.
We can merge them later.
Post by Victor Duchovni
Ditto, for destinations where connection caching is always on
via $smtp_connection_cache_destinations.
Wietse
Victor Duchovni
2007-08-30 01:36:30 UTC
Permalink
Post by Victor Duchovni
Sounds like we have most of a strategy. It will take me a bit of
experimentation to arrive at the right implementation, so I don't expect
to have it ready in under a few weeks of part-time tinkering. If anyone
has further design suggestions in the mean-time, I am all ears.
I forgot one important consideration in the negative vs. positive
discussion. The picture is made more complex by "nolisting".

Does "nolisting" look for each and every new connection to an active
MX host to be preceded by a connection to an inactive MX host (which
refuses connections with TCP RST)? If so, the negative cache may work
better, provided we don't do negative caching of hosts that quickly
refuse connections.

The negative cache would be used to stay away from high latency
black-holes, but not hosts that quickly say no. The definition of
"quickly" is not too difficult, either under a small factor of the
latency of an actual delivery, or under a suitable fraction of the
connection timeout.

Is "nolisting" a sufficiently good reason to avoid an otherwise good
positive cache?
--
Viktor.
Victor Duchovni
2007-08-30 04:26:36 UTC
Permalink
Post by Victor Duchovni
Post by Victor Duchovni
Sounds like we have most of a strategy. It will take me a bit of
experimentation to arrive at the right implementation, so I don't expect
to have it ready in under a few weeks of part-time tinkering. If anyone
has further design suggestions in the mean-time, I am all ears.
I forgot one important consideration in the negative vs. positive
discussion. The picture is made more complex by "nolisting".
Sorry, "nolisting" is not a problem, the more aggressive "unlisting" is
what I had in mind.
Post by Victor Duchovni
Does "nolisting" look for each and every new connection to an active
MX host to be preceded by a connection to an inactive MX host (which
refuses connections with TCP RST)? If so, the negative cache may work
better, provided we don't do negative caching of hosts that quickly
refuse connections.
Jorey recommends a 4000s grace-period for "unlisting", and there is no
requirement for *each* connection to first try the dead host, one just
has to try it not too long before contacting the good host.

This suggests that positive cache entries that live for ~300s are going
to work fine in practice, and in any case "unlisting" is presumably
even less common than "nolisting", but likely not completely negligible.
Post by Victor Duchovni
The negative cache would be used to stay away from high latency
black-holes, but not hosts that quickly say no. The definition of
"quickly" is not too difficult, either under a small factor of the
latency of an actual delivery, or under a suitable fraction of the
connection timeout.
Is "nolisting" a sufficiently good reason to avoid an otherwise good
positive cache?
So is positive cachinng still good enough?
--
Viktor.
Wietse Venema
2007-08-30 11:16:21 UTC
Permalink
Post by Victor Duchovni
Post by Victor Duchovni
Sounds like we have most of a strategy. It will take me a bit of
experimentation to arrive at the right implementation, so I don't expect
to have it ready in under a few weeks of part-time tinkering. If anyone
has further design suggestions in the mean-time, I am all ears.
I forgot one important consideration in the negative vs. positive
discussion. The picture is made more complex by "nolisting".
Does "nolisting" look for each and every new connection to an active
MX host to be preceded by a connection to an inactive MX host (which
refuses connections with TCP RST)? If so, the negative cache may work
better, provided we don't do negative caching of hosts that quickly
refuse connections.
Positive caching will work around IP addresses that don't respond
(dead host) or that send TCP RST quickly (nolisting). The SMTP
client will never enter such addresses/connections into the cache,
and therefore such servers will be avoided.

If the SMTP client caches IP addresses/connections that don't
provide SMTP service, then the SMTP client needs to be fixed, not
the positive/negative caching method. The way the SMTP client is
implemented, this will never happen since a connection is cached
only after successful delivery.

Wietse
Post by Victor Duchovni
The negative cache would be used to stay away from high latency
black-holes, but not hosts that quickly say no. The definition of
"quickly" is not too difficult, either under a small factor of the
latency of an actual delivery, or under a suitable fraction of the
connection timeout.
Is "nolisting" a sufficiently good reason to avoid an otherwise good
positive cache?
--
Viktor.
Victor Duchovni
2007-08-30 13:24:44 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
I forgot one important consideration in the negative vs. positive
discussion. The picture is made more complex by "nolisting".
Does "nolisting" look for each and every new connection to an active
MX host to be preceded by a connection to an inactive MX host (which
refuses connections with TCP RST)? If so, the negative cache may work
better, provided we don't do negative caching of hosts that quickly
refuse connections.
Positive caching will work around IP addresses that don't respond
(dead host) or that send TCP RST quickly (nolisting). The SMTP
client will never enter such addresses/connections into the cache,
and therefore such servers will be avoided.
If the SMTP client caches IP addresses/connections that don't
provide SMTP service, then the SMTP client needs to be fixed, not
the positive/negative caching method. The way the SMTP client is
implemented, this will never happen since a connection is cached
only after successful delivery.
I must not have explained myself clearly, the concern was that
descriptor-less caching of a good secondary MX will cause to reconnect
to it without first trying a non-functioning (always TCP RST) primary MX.

Now with "nolisting" this is not problem, but with "unlisting" (SMTP
"port knocking" with fixed port/variable IP) it could be more problematic,
but likely still fine, given Jorey's recommended timeout (of 4000s).

If the "unlisting" timeout is lower than $smtp_connection_reuse_time_limit
(default 300s), Postfix may fail to connect to the working MX and defer
a message that could have been delivered.

Is this a reasonable concern, or does it just demonstrate another reason
why unlisting is not a good idea (other MTAs may also cache host status).
We could take the view that "unlisting" is not sufficiently interesting
to design around.
--
Viktor.
Wietse Venema
2007-08-30 13:53:10 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
Post by Victor Duchovni
I forgot one important consideration in the negative vs. positive
discussion. The picture is made more complex by "nolisting".
Does "nolisting" look for each and every new connection to an active
MX host to be preceded by a connection to an inactive MX host (which
refuses connections with TCP RST)? If so, the negative cache may work
better, provided we don't do negative caching of hosts that quickly
refuse connections.
Positive caching will work around IP addresses that don't respond
(dead host) or that send TCP RST quickly (nolisting). The SMTP
client will never enter such addresses/connections into the cache,
and therefore such servers will be avoided.
If the SMTP client caches IP addresses/connections that don't
provide SMTP service, then the SMTP client needs to be fixed, not
the positive/negative caching method. The way the SMTP client is
implemented, this will never happen since a connection is cached
only after successful delivery.
I must not have explained myself clearly, the concern was that
descriptor-less caching of a good secondary MX will cause to reconnect
to it without first trying a non-functioning (always TCP RST) primary MX.
Client-side caching of positive or negative server IP address status
information may interact poorly with servers whose IP addresses
status depends on previous client behavior.

Some server features (e.g., port knocking) interact poorly with
client-side positive caching of server IP address status, while
others (e.g., greylisting) interact poorly with client-side negative
caching of the same.

It's a fundamental problem with client-side caching of server IP
address status information. As people will invent more client
history dependent anti-spam mechanisms, the situation will only
get worse. But, given the focus on blocking bad clients, my bets
would be on client-side positive caching as less likely to break.

In any case, my recommendation for positive IP address status
caching is to use the same short timeouts as for socket caching.
This way, positive caching won't affect too many deliveries. If
it is something that requires the bother of manual tuning, then it
probably isn't worth the bother of implementing and maintaining.

Wietse
Victor Duchovni
2007-08-30 14:26:47 UTC
Permalink
Post by Wietse Venema
It's a fundamental problem with client-side caching of server IP
address status information. As people will invent more client
history dependent anti-spam mechanisms, the situation will only
get worse. But, given the focus on blocking bad clients, my bets
would be on client-side positive caching as less likely to break.
In any case, my recommendation for positive IP address status
caching is to use the same short timeouts as for socket caching.
This way, positive caching won't affect too many deliveries. If
it is something that requires the bother of manual tuning, then it
probably isn't worth the bother of implementing and maintaining.
We're on the same page. There will not be a need for separate tuning,
the timeouts will be same as for descriptor-full caching. After ~2s of
non-use positive cache entries will be dropped. So "unlisting" is only a
potential issue with a destination that has sustained traffic keeping the
positive entry active for its full expiration (~300s), and even then only
if the "unlisting" timeouts are much more aggressive than recommended.

I am game to move forward with that caveat...
--
Viktor.
Wietse Venema
2007-08-31 00:24:20 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
It's a fundamental problem with client-side caching of server IP
address status information. As people will invent more client
history dependent anti-spam mechanisms, the situation will only
get worse. But, given the focus on blocking bad clients, my bets
would be on client-side positive caching as less likely to break.
In any case, my recommendation for positive IP address status
caching is to use the same short timeouts as for socket caching.
This way, positive caching won't affect too many deliveries. If
it is something that requires the bother of manual tuning, then it
probably isn't worth the bother of implementing and maintaining.
We're on the same page. There will not be a need for separate tuning,
the timeouts will be same as for descriptor-full caching. After ~2s of
non-use positive cache entries will be dropped. So "unlisting" is only a
potential issue with a destination that has sustained traffic keeping the
positive entry active for its full expiration (~300s), and even then only
if the "unlisting" timeouts are much more aggressive than recommended.
I am game to move forward with that caveat...
Go, Victor, go.

Wietse
Victor Duchovni
2007-08-31 17:28:49 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
We're on the same page. There will not be a need for separate tuning,
the timeouts will be same as for descriptor-full caching. After ~2s of
non-use positive cache entries will be dropped. So "unlisting" is only a
potential issue with a destination that has sustained traffic keeping the
positive entry active for its full expiration (~300s), and even then only
if the "unlisting" timeouts are much more aggressive than recommended.
I am game to move forward with that caveat...
Go, Victor, go.
One more note before any code is written, it seems we can add a
useful control for demand connection caching (descriptor-full and
descriptor-less), that stores connections in the cache only if one of
the following additional constraints holds:

- the connection came from the cache (with or without a descriptor).

- we made a brand new connection not based on data from the cache,
and the setup latency ("c" time in delays=a/b/c/d) exceeded the
transaction latency ("d") by a factor of 3 or more.

This optimization would be subject to a new boolean control that can be
turned off on a per-transport basis, but should IMHO be on by default
for the "smtp" transport (and perhaps off the for "relay" transport).

Rationale:

- We avoid caching connections to (remote) servers when all is well,
and connections setup is cheap. We don't blame longer per-destination
active queue occupancy on connection setup cost and start caching,
unless the setup cost is actually observed to be high.

- Postfix is a better citizen, avoids caching connections when there
is no need to do so.

Exceptions:

- At high message rates, sending to a fork/execg content filter leads
to high CPU utilization, lowering throughput, but transaction setup
latency may not rise out of step with also CPU intensive transaction
latency. Fork/exec filters ought not have a high startup cost,
but this can happen. The feature should be disabled for transports
feeding such content filters. (Appropriate examples and language in
FILTER_README, ...)

- For high volume inbound relaying ("relay" transport), it is
possible that all servers accept connections readily, but some are I/O
saturated and have a higher transaction latency. We moved connection
re-use limits from counts to expiration time, to avoid "attractor"
behaviour, where the slowest server ends up with the most connections
(possibly using up all available concurrency, ...).

When sending to one's own relay domains, it is more reasonable
to demand a larger share of SMTP server resources. So the feature
(demand connection caching only when connection setup is expensive)
should be off for the relay transport.

Is this sensible? It can lead to better load distribution (connection
leads to more concentrated bursts of traffic to a subset of the MX hosts
handling a destination) and avoid server connection hogging in cases when
it yields no significant advantage...
--
Viktor.
Noel Jones
2007-08-31 18:20:14 UTC
Permalink
Post by Victor Duchovni
One more note before any code is written, it seems we can add a
useful control for demand connection caching (descriptor-full and
descriptor-less), that stores connections in the cache only if one of
- the connection came from the cache (with or without a descriptor).
- we made a brand new connection not based on data from the cache,
and the setup latency ("c" time in delays=a/b/c/d) exceeded the
transaction latency ("d") by a factor of 3 or more.
This optimization would be subject to a new boolean control that can be
turned off on a per-transport basis, but should IMHO be on by default
for the "smtp" transport (and perhaps off the for "relay" transport).
Would this interact well with destinations that have multiple dead
hosts, but the one that works responds quickly (eg. hotmail)?

Would this be automatically disabled for domains listed in
$smtp_connection_cache_destinations?

If yes, sounds good. But I can't imagine what you would call such a feature.

my lame stab at it...
smtp_connection_cache_on_demand {[yes], no}
enable caching when there's lots of mail for a destination and
postfix thinks the connection startup time is > 3x the transaction time
smtp_connection_cache_on_demand_aggressive {yes, [no]}??
ignore startup vs. transaction times when deciding to cache a
destination. [yes] may be useful for internal relays and content filters.
--
Noel Jones
Victor Duchovni
2007-08-31 18:31:58 UTC
Permalink
Post by Noel Jones
Post by Victor Duchovni
One more note before any code is written, it seems we can add a
useful control for demand connection caching (descriptor-full and
descriptor-less), that stores connections in the cache only if one of
- the connection came from the cache (with or without a descriptor).
- we made a brand new connection not based on data from the cache,
and the setup latency ("c" time in delays=a/b/c/d) exceeded the
transaction latency ("d") by a factor of 3 or more.
This optimization would be subject to a new boolean control that can be
turned off on a per-transport basis, but should IMHO be on by default
for the "smtp" transport (and perhaps off the for "relay" transport).
Would this interact well with destinations that have multiple dead
hosts, but the one that works responds quickly (eg. hotmail)?
Yes. Each time setup latency is encountered, a connection is added
to the cache and used until it expires (~300s), while not all good
sessions are added to the cache immediately (a form of slow start for
cache occuppancy), if latency is encountered for new connections often
enough, the cache fills with cached sessions, and the latency disappears.

Suppose you need a concurrency of 10 to keep mail flowing at steady state,
and 2 out of 5 MX hosts exhibit slow setup. Starting from an empty cache,
6 out of the first 10 sessions complete with no delay, and don't fill
the cache, 4 session try multiple MX hosts and get cached.

Now you have 4 cached sessions, but you need more concurrency, so new
connections are made again, with some getting quick service, and some
filling the cache. Very quickly you have 10 cached sessions and don't
see any latency.
Post by Noel Jones
smtp_connection_cache_on_demand {[yes], no}
enable caching when there's lots of mail for a destination and
postfix thinks the connection startup time is > 3x the transaction time
smtp_connection_cache_on_demand_aggressive {yes, [no]}??
ignore startup vs. transaction times when deciding to cache a
destination. [yes] may be useful for internal relays and content filters.
Thanks, should the "aggressive" version imply "cache on demand" or
modify it?
--
Viktor.
Noel Jones
2007-08-31 18:38:31 UTC
Permalink
Post by Victor Duchovni
Post by Noel Jones
Would this interact well with destinations that have multiple dead
hosts, but the one that works responds quickly (eg. hotmail)?
Yes.
Sounds good.
Post by Victor Duchovni
Post by Noel Jones
smtp_connection_cache_on_demand {[yes], no}
enable caching when there's lots of mail for a destination and
postfix thinks the connection startup time is > 3x the transaction time
smtp_connection_cache_on_demand_aggressive {yes, [no]}??
ignore startup vs. transaction times when deciding to cache a
destination. [yes] may be useful for internal relays and content filters.
Thanks, should the "aggressive" version imply "cache on demand" or
modify it?
I'm thinking modify it; no effect if cache on demand is "no".
--
Noel Jones
Wietse Venema
2007-09-01 14:25:48 UTC
Permalink
Post by Victor Duchovni
Post by Noel Jones
Post by Victor Duchovni
One more note before any code is written, it seems we can add a
useful control for demand connection caching (descriptor-full and
descriptor-less), that stores connections in the cache only if one of
- the connection came from the cache (with or without a descriptor).
- we made a brand new connection not based on data from the cache,
and the setup latency ("c" time in delays=a/b/c/d) exceeded the
transaction latency ("d") by a factor of 3 or more.
This optimization would be subject to a new boolean control that can be
turned off on a per-transport basis, but should IMHO be on by default
for the "smtp" transport (and perhaps off the for "relay" transport).
Would this interact well with destinations that have multiple dead
hosts, but the one that works responds quickly (eg. hotmail)?
Yes. Each time setup latency is encountered, a connection is added
to the cache and used until it expires (~300s), while not all good
sessions are added to the cache immediately (a form of slow start for
cache occuppancy), if latency is encountered for new connections often
enough, the cache fills with cached sessions, and the latency disappears.
Why is closing a successful new connection an optimization? This
"slow start" like approach only postpones the "steady state"
condition where all connections are cached anyway. By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.

If your concern is that the connection cache holds too many
connections, then that would be a cache server concern. Connection
cache resource management problems are better addressed by the
cache server, instead of by cache clients.

The cache server can manage its resources via many methods:

- Gradually reducing retention times when the cache fills up over
50%. This requires updating events.c with sub-second time
resolution, something that should have been done long ago anyway.

- Randomly dropping cache entries as a final measure.

- Partitioning the cache, so that we have separate pools with
different resource limits.

The connection cache will work out of the box with a limited
number of outbound mail delivery transports, so that the number
of cache entries is naturally bounded to O(process limit).

Wietse
Wietse Venema
2007-09-01 15:18:09 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
Post by Noel Jones
Post by Victor Duchovni
One more note before any code is written, it seems we can add a
useful control for demand connection caching (descriptor-full and
descriptor-less), that stores connections in the cache only if one of
- the connection came from the cache (with or without a descriptor).
- we made a brand new connection not based on data from the cache,
and the setup latency ("c" time in delays=a/b/c/d) exceeded the
transaction latency ("d") by a factor of 3 or more.
This optimization would be subject to a new boolean control that can be
turned off on a per-transport basis, but should IMHO be on by default
for the "smtp" transport (and perhaps off the for "relay" transport).
Would this interact well with destinations that have multiple dead
hosts, but the one that works responds quickly (eg. hotmail)?
Yes. Each time setup latency is encountered, a connection is added
to the cache and used until it expires (~300s), while not all good
sessions are added to the cache immediately (a form of slow start for
cache occuppancy), if latency is encountered for new connections often
enough, the cache fills with cached sessions, and the latency disappears.
Why is closing a successful new connection an optimization? This
"slow start" like approach only postpones the "steady state"
condition where all connections are cached anyway. By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.
Having read the earlier post I still am not convinced that this
idea solves more problems than it creates. Postfix already has ways
to be a good citizen: it keepings connections open only when it
knows that it is doing back-to-back deliveries to the same site.

/*
* Turn on session caching after we get up to speed. Don't enable
* session caching just because we have concurrent deliveries. This
* prevents unnecessary session caching when we have a burst of mail
* <= the initial concurrency limit.
*/
if ((queue->dflags & DEL_REQ_FLAG_SCACHE) == 0) {
if (BACK_TO_BACK_DELIVERY()) {
if (msg_verbose)
msg_info("%s: allowing on-demand session caching for %s",
myname, queue->name);
queue->dflags |= DEL_REQ_FLAG_SCACHE;
}
}

And:

/*
* Turn off session caching when concurrency drops and we're running
* out of steam. This is what prevents from turning off session
* caching too early, and from making new connections while old ones
* are still cached.
*/
else {
if (!CONCURRENT_OR_BACK_TO_BACK_DELIVERY()) {
if (msg_verbose)
msg_info("%s: disallowing on-demand session caching for %s!
myname, queue->name);
queue->dflags &= ~DEL_REQ_FLAG_SCACHE;
}
}

Postfix is not naively assuming that (lots of mail in active queue
for site X) means that it will be actually able to maintain a
sustain continuous mail flow to site X. It verifies the condition
before enabling connection caching.

For your setup/transmission latency dependent connection caching
to be implemented, it would have to be off by default, and it would
need a description of its adverse results. Considering that the
feature has limited use, I doubt it is worth the trouble of
implementing and maintaining.

Wietse
Victor Duchovni
2007-09-02 01:11:34 UTC
Permalink
Post by Wietse Venema
Post by Wietse Venema
Why is closing a successful new connection an optimization? This
"slow start" like approach only postpones the "steady state"
condition where all connections are cached anyway. By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.
Having read the earlier post I still am not convinced that this
idea solves more problems than it creates. Postfix already has ways
to be a good citizen: it keepings connections open only when it
knows that it is doing back-to-back deliveries to the same site.
Yes, I am aware of this.
Post by Wietse Venema
Postfix is not naively assuming that (lots of mail in active queue
for site X) means that it will be actually able to maintain a
sustain continuous mail flow to site X. It verifies the condition
before enabling connection caching.
Yes, this is important, what I am trying to do is to combine the queue
manager's signal that caching may be appropriate with observations of
connection setup latency, and only cache when it is likely to be useful.
Caching when connection setup is quick is likely to make deliveries to
remote hosts less uniformly distributed over the available hosts, and will
in some cases cache connections that are despite best effort not re-used.
Post by Wietse Venema
For your setup/transmission latency dependent connection caching
to be implemented, it would have to be off by default, and it would
need a description of its adverse results. Considering that the
feature has limited use, I doubt it is worth the trouble of
implementing and maintaining.
If it is off by default, we may as well not bother. A good citizenship
feature that is not on by default is not very useful :-(

I'll do some data mining on "c/d" rations for new and cached connections
from a few weeks of logs, and see whether the distribution is sufficiently
bi-modal to admit a natural classification into fast/slow buckets.
--
Viktor.
Wietse Venema
2007-09-02 01:40:18 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
Post by Wietse Venema
Why is closing a successful new connection an optimization? This
"slow start" like approach only postpones the "steady state"
condition where all connections are cached anyway. By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.
Having read the earlier post I still am not convinced that this
idea solves more problems than it creates. Postfix already has ways
to be a good citizen: it keepings connections open only when it
knows that it is doing back-to-back deliveries to the same site.
Yes, I am aware of this.
Post by Wietse Venema
Postfix is not naively assuming that (lots of mail in active queue
for site X) means that it will be actually able to maintain a
sustain continuous mail flow to site X. It verifies the condition
before enabling connection caching.
Yes, this is important, what I am trying to do is to combine the queue
manager's signal that caching may be appropriate with observations of
connection setup latency, and only cache when it is likely to be useful.
Caching when connection setup is quick is likely to make deliveries to
remote hosts less uniformly distributed over the available hosts, and will
in some cases cache connections that are despite best effort not re-used.
Post by Wietse Venema
For your setup/transmission latency dependent connection caching
to be implemented, it would have to be off by default, and it would
need a description of its adverse results. Considering that the
feature has limited use, I doubt it is worth the trouble of
implementing and maintaining.
If it is off by default, we may as well not bother. A good citizenship
feature that is not on by default is not very useful :-(
It is NOT a good citizenship feature. It is a shoot yourself into
the foot feature that throws away a factor of two (or whatever) in
Postfix mail delivery performance.
Post by Victor Duchovni
I'll do some data mining on "c/d" rations for new and cached connections
from a few weeks of logs, and see whether the distribution is sufficiently
bi-modal to admit a natural classification into fast/slow buckets.
According to my own data, wide area setup times are already in the
same ball-park as mail delivery times for small messages. As disks
and CPUs get faster, the benefits of setup/transmission dependent
connection caching become more and more limited to short distances
or large messages.

Finally, I have yet to see that a briefly idle connection is a bad
citizen problem, especially since it has reduced the total time
spent, even on the receiving side.

Wietse
Victor Duchovni
2007-09-02 02:36:35 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
If it is off by default, we may as well not bother. A good citizenship
feature that is not on by default is not very useful :-(
It is NOT a good citizenship feature. It is a shoot yourself into
the foot feature that throws away a factor of two (or whatever) in
Postfix mail delivery performance.
It is (perhaps an impractical) attempt to discover cases where connection
caching is not in fact improving throughput, and to cache only in cases
where it does. Perhaps this is a non-problem. [ Some SOHO systems submit
to ISP MSAs with various policies to throttle per-source injection,
and avoiding re-use that is not noticeably improving throughput may in
some cases appease the ISP's controls. ].

Is this entirely a waste of time, or could this be rescued by a better
filter than a high c/d ratio?
Post by Wietse Venema
Post by Victor Duchovni
I'll do some data mining on "c/d" rations for new and cached connections
from a few weeks of logs, and see whether the distribution is sufficiently
bi-modal to admit a natural classification into fast/slow buckets.
According to my own data, wide area setup times are already in the
same ball-park as mail delivery times for small messages. As disks
and CPUs get faster, the benefits of setup/transmission dependent
connection caching become more and more limited to short distances
or large messages.
Finally, I have yet to see that a briefly idle connection is a bad
citizen problem, especially since it has reduced the total time
spent, even on the receiving side.
The potential issue, is that will all the bots not dropping connections
lately, connection slots are increasingly a precious resource, and I
don't want to keep them idle if at all reasonable...

I'll come back after a better analysis of my logs...
--
Viktor.
Wietse Venema
2007-09-02 13:52:01 UTC
Permalink
Post by Victor Duchovni
The potential issue, is that will all the bots not dropping connections
lately, connection slots are increasingly a precious resource, and I
don't want to keep them idle if at all reasonable...
That is a large red herring.

First of all, Postfix is careful. It will keep connections open
only under exceptional conditions, when it actually has a pile of
mail queued for that site, and when it has evidence that it can
keep the flow going.

Second, careful connection caching makes the total receiver-side
connect time actually SHORTER, because the receiving side is not
forced to waste its time repeatedly with 1.5 RTT HELO handshakes.
That's 1.5 RTT out of a total of 3.5 RTT for small messages that
the receiving side spends waiting.

If my server were under stress, then I would welcome clients that
need 40% less connection time to get their mail delivered. Of course
this holds only when the sender is able to maintain a sufficient
level of connection utilization, but your proposal isn't measuring
connection utilization. It measures the time wasted in the client
skipping around dead servers. It is collecting the wrong evidence.

I'm not ready to throw away a factor two in Postfix delivery
performance on the basis of wrong evidence.

Wietse
Victor Duchovni
2007-09-02 15:33:56 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
The potential issue, is that will all the bots not dropping connections
lately, connection slots are increasingly a precious resource, and I
don't want to keep them idle if at all reasonable...
That is a large red herring.
OK, never-mind for now. No harm done thinking about it...

A related question, right now the flag that tells smtp(8) to check for
cached connections, is the same as the flag that causes it to place
connections in the cache, so when connection caching is turned off, it
seems that as the destination queue is drained the last set of cached
sessions may time out unused.

Would it be useful to have a flag that enables cache lookup, but does not
enable caching of new sessions? Is it possible in the queue manager to
turn off session caching in two steps (first disable new cache entries,
and later (after "window" more deliveries are scheduled) also disable
lookups?
--
Viktor.
Wietse Venema
2007-09-03 00:30:23 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
Post by Victor Duchovni
The potential issue, is that will all the bots not dropping connections
lately, connection slots are increasingly a precious resource, and I
don't want to keep them idle if at all reasonable...
That is a large red herring.
OK, never-mind for now. No harm done thinking about it...
A related question, right now the flag that tells smtp(8) to check for
cached connections, is the same as the flag that causes it to place
connections in the cache, so when connection caching is turned off, it
seems that as the destination queue is drained the last set of cached
sessions may time out unused.
Would it be useful to have a flag that enables cache lookup, but does not
enable caching of new sessions? Is it possible in the queue manager to
turn off session caching in two steps (first disable new cache entries,
and later (after "window" more deliveries are scheduled) also disable
lookups?
Sure, such a refinement would improve corner cases.

Wietse

Victor Duchovni
2007-09-01 23:49:00 UTC
Permalink
Post by Wietse Venema
Post by Victor Duchovni
Yes. Each time setup latency is encountered, a connection is added
to the cache and used until it expires (~300s), while not all good
sessions are added to the cache immediately (a form of slow start for
cache occuppancy), if latency is encountered for new connections often
enough, the cache fills with cached sessions, and the latency disappears.
Why is closing a successful new connection an optimization?
This "slow start" like approach only postpones the "steady state"
condition where all connections are cached anyway.
It is an optimization, because under typical conditions we won't cache
any connections at all, slow start only happens for destinations that
do exhibit sufficiently expensive connection setup.
Post by Wietse Venema
By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.
Yes, the factor of 3 is not necessarily right, I have not yet figured out
the right metric for "expensive" setup. Given that this demand caching
anyway, we could make it 1 rather than 3, and lose at most a factor
of 2 in throughput, but in practice setup will either be very fast,
or very slow, so I don't think this is really a problem...
Post by Wietse Venema
If your concern is that the connection cache holds too many
connections, then that would be a cache server concern. Connection
cache resource management problems are better addressed by the
cache server, instead of by cache clients.
My concern is that we are keeping remote servers busy even when connection
caching is not necessary (fast setup).
--
Viktor.
Wietse Venema
2007-09-02 01:30:45 UTC
Permalink
Post by Victor Duchovni
Post by Wietse Venema
By choosing an
arbitrary setup/transmission time ratio of 3, you're willing to
sacrifice a factor of up to 4 in mail throughput by not caching
any connections with sites that have moderate setup latencies.
Yes, the factor of 3 is not necessarily right, I have not yet figured out
the right metric for "expensive" setup. Given that this demand caching
anyway, we could make it 1 rather than 3, and lose at most a factor
of 2 in throughput, but in practice setup will either be very fast,
or very slow, so I don't think this is really a problem...
I would not lighthartedly sacrifice a factor of two in performance.
Turning off connection caching deteriorates performance unless the
setup time is an order of magnitude smaller than the mail transaction
time. With small email messages that condition is becoming less
and less likely.

Allow me to clarify.

With small email messages and fast computers (fast compared to
network latencies), delivery performance is dominated by waiting
for network round-trip times. With ESMTP command pipelining there
are only a few round-trip wait states: the TCP handshake (1.5
RTT), the HELO handshake (1.5 RTT), the DATA reply (1 RTT), and
the END-OF-DATA reply (1 RTT). With connection caching we can
reduce the time spent in "wait states" from 5 RTT to 2 RTT, at
least that's the theory.

In reality, the numbers change due limited CPU disk performance.
But CPUs and disks are getting faster, and networks are getting
wider, while network round-trip times are limited by Einstein's
speed of light. This is not going to change anytime soon. It will
always take 0.1 second to go across the continental USA and back.

Flash-memory disks in the tens of Gbytes are already a reality,
and these have much lower latencies than mechanical drives. BTW
flash disks have tremendous forensic potential: internally they
implement a (proprietary) journaling file system.

Future connection setup times will become more and more similar to
message transmission times, especially for small messages and
wide-area traffic. And that is still a large if not the largest
portion of traffic handled by many Postfix servers.

Thus, the "gains" from making caching dependent on setup/delivery
time ratios are increasingly time and distance limited.
Post by Victor Duchovni
Post by Wietse Venema
If your concern is that the connection cache holds too many
connections, then that would be a cache server concern. Connection
cache resource management problems are better addressed by the
cache server, instead of by cache clients.
My concern is that we are keeping remote servers busy even when connection
caching is not necessary (fast setup).
This concern becomes less and less applicable with wide-area traffic.
As discussed above, wide-area setup times are getting into the same
ball-park as wide-area transmission times (and my limited sample
from local logfiles seems to confirm this).

See also my other reply: Postfix won't engage in connection caching
just because it has a non-empty active queue. The knowledge for
decisions on connection caching is in the queue manager. Having
the SMTP client fight recommendations by its own queue manager is
unproductive. For one thing, the client can't know the mix of
small and large messages.

Finally, network latency is not everything. As you observed,
connection caching can save lots of CPU cycles with server that
fork and/or exec for every connection (qmail comes to mind). It's
another aspect that the Postfix SMTP client simply cannot take into
consideration.

Wietse
Loading...