Discussion:
MacOS X poll on /dev/random still borked
Wietse Venema
2013-03-25 15:15:29 UTC
Permalink
Postfix snapshot 20130324 uses kqueue() for MacOS X 8.x in Postfix
event handling routines (instead of using select()).

Unfortunately, we missed one MacOS bug.

When Postfix uses kqueue() for event handling, it relies on poll()
to enforce time limits on individual read/write operations. Prior
to snapshot 20130324, Postfix on MacOS X would use select() for
both event handling and for time-limiting individual read/write
operations.

Viktor Dukhovni reports that MacOS poll() support is still broken
for /dev/urandom. This breaks tlsmgr(8), as discussed in:

http://archives.neohapsis.com/archives/postfix/2009-12/thread.html#805

Workaround:

$ make makefiles "CCARGS=-DNO_KQUEUE"

I wrote a quick program to test if instead of poll() Postfix could
use kqueue() but that program already fails on FreeBSD (fatal:
kevent EV_ADD: Operation not supported by device). As FreeBSD is
a major provider of MacOS kernel code, I've decided not to pursue
this path further.

Postfix insists on using {read,write}_wait() and {read,writ}able()
with each individual read/write operation, because the program must
never block forever. I'm sure many system adminstrators appreciate
that Postfix does not lock up easily.

Postfix wants to use poll() instead of select() in {read,write}_wait()
and {read,writ}able(), because these functions may be called with
file descriptors >= FD_SETSIZE, the limit of the size of a file
descriptor set used by select().

Also relevant is that Postfix servers that manage many file handles
leave some descriptors < 128 unused for the benefit of (non-Postfix)
library code that wants to use select() internally.

I see the following options:

- Until MacOS X is fixed, keep using select() for event handling
(and to enforce time limits on each read/write operation). It's
not primarily a server platform anyway. This is what I have chosen
as an initial solution (i.e. Postfix works exactly as before).

- Use kqueue() for event handling, and use select() instead of
poll() to enforce time limits on each read/write operation, but
increase FD_SETSIZE at compile time. The FreeBSD and Darwin
select() manpages document this as a legitimate way to handle
larger file descriptor numbers. This means dragging kbyte-size
bitsets into and out of the select() interface. Such code is slow,
and to avoid this, kqueue() was created (followed soon by Solaris
and Linux equivalents).

- Use kqueue() for event handling, and use select() instead of
poll() to enforce time limits on each read/write operation, but
add code that dup()s a descriptor >= FD_SETSIZE to a temporary
descriptor with a lower number, and select() on that temporary
descriptor instead. This might work (we're unlikely to encounter
POSIX fcntl() locking brain damage on non-file objects), but burns
more CPU cycles. On the other hand the odds that Postfix will
handle file descriptors >= FD_SETSIZE is small on MacOS X.

Wietse
Axel Luttgens
2013-03-25 17:13:36 UTC
Permalink
Post by Wietse Venema
Postfix snapshot 20130324 uses kqueue() for MacOS X 8.x in Postfix
event handling routines (instead of using select()).
Unfortunately, we missed one MacOS bug.
When Postfix uses kqueue() for event handling, it relies on poll()
to enforce time limits on individual read/write operations. Prior
to snapshot 20130324, Postfix on MacOS X would use select() for
both event handling and for time-limiting individual read/write
operations.
Hello Wietse,

Would have been too nice, probably. :-(

And yet more maddening, it is clearly stated in the man page:

BUGS
The poll() system call currently does not support devices.
Post by Wietse Venema
Viktor Dukhovni reports that MacOS poll() support is still broken
http://archives.neohapsis.com/archives/postfix/2009-12/thread.html#805
$ make makefiles "CCARGS=-DNO_KQUEUE"
[...]
[...]
I quickly looked at what Apple did for that problem; seems to be contained in:

http://www.opensource.apple.com/source/postfix/postfix-247/postfix/src/util/iostuff.h
http://www.opensource.apple.com/source/postfix/postfix-247/postfix/src/util/read_wait.c
http://www.opensource.apple.com/source/postfix/postfix-247/postfix/src/util/timed_read.c
http://www.opensource.apple.com/source/postfix/postfix-247/postfix/src/tls/tls_prng_dev.c

Don't know whether this is a good idea, but seems potentially transposable to the general case of OSes not allowing poll() on /dev/urandom while preserving the overall logics.

Axel
Wietse Venema
2013-03-25 17:45:21 UTC
Permalink
Post by Wietse Venema
Viktor Dukhovni reports that MacOS poll() support is still broken
http://archives.neohapsis.com/archives/postfix/2009-12/thread.html#805
...
That appears to be a special-case subset of my third solution
(enforce read/write time limits with select() instead of poll()).

Their solution a) handles reading only, and b) unconditionally fails
on descriptors >= FD_SETSIZE. My solution handles reading and
writing, and tries to dup() descriptors >= FD_SETSIZE down which
will practially eliminate the problem.

Wietse
Axel Luttgens
2013-03-25 18:14:12 UTC
Permalink
Post by Wietse Venema
Post by Wietse Venema
Viktor Dukhovni reports that MacOS poll() support is still broken
http://archives.neohapsis.com/archives/postfix/2009-12/thread.html#805
...
That appears to be a special-case subset of my third solution
(enforce read/write time limits with select() instead of poll()).
Their solution a) handles reading only, and b) unconditionally fails
on descriptors >= FD_SETSIZE. My solution handles reading and
writing, and tries to dup() descriptors >= FD_SETSIZE down which
will practially eliminate the problem.
Yes, I was just busy writing an add-on to my previous mail, when your reply arrived...

In fact, it was the idea of making use of the unused_context parameter while concentrating the bulk of the change into a single place that appealed me; but with of course a nicer handling of the fd > FD_SETSIZE case, by duplication of the fd when needed as you suggested.

Now, I didn't think about the case of writing to a device; would this be needed in the case of Postfix too?

Axel
Wietse Venema
2013-03-25 18:42:43 UTC
Permalink
Axel Luttgens:
[ Charset ISO-8859-1 unsupported, converting... ]
Post by Axel Luttgens
Post by Wietse Venema
Post by Wietse Venema
Viktor Dukhovni reports that MacOS poll() support is still broken
http://archives.neohapsis.com/archives/postfix/2009-12/thread.html#805
...
That appears to be a special-case subset of my third solution
(enforce read/write time limits with select() instead of poll()).
Their solution a) handles reading only, and b) unconditionally fails
on descriptors >= FD_SETSIZE. My solution handles reading and
writing, and tries to dup() descriptors >= FD_SETSIZE down which
will practially eliminate the problem.
Yes, I was just busy writing an add-on to my previous mail, when your reply arrived...
In fact, it was the idea of making use of the unused_context
parameter while concentrating the bulk of the change into a single
place that appealed me; but with of course a nicer handling of the
fd > FD_SETSIZE case, by duplication of the fd when needed as you
suggested.
Now, I didn't think about the case of writing to a device; would
this be needed in the case of Postfix too?
Sorry, Postfix must be able to evolve. That is in a direct conflict
with the existence of partial solutions (such as an infrastructure
for read/write timeouts that can't handle all I/O resources).


Wietse

Loading...