Discussion:
PATCH: MacOS X use select() instead of poll()
(too old to reply)
Wietse Venema
2013-03-25 19:12:38 UTC
Permalink
This patch implements the third solution that I described earlier:
use kqueue() for event handling, and use select() to enforce
read/write time limits.

This code will attempt to handle descriptors >=FD_SETSIZE, by
dup()ing them down if possible. This will in practice eliminate the
FD_SETSIZE limit with Postfix daemons (these reserve some 100 low
file descriptors for third-party libraries that use select()
internally).

Wietse

diff --exclude=man --exclude=html --exclude=README_FILES --exclude=.indent.pro --exclude=Makefile.in -ur /var/tmp/postfix-2.11-20130324/src/util/read_wait.c src/util/read_wait.c
--- /var/tmp/postfix-2.11-20130324/src/util/read_wait.c 2009-06-12 19:18:49.000000000 -0400
+++ src/util/read_wait.c 2013-03-25 14:50:06.000000000 -0400
@@ -15,8 +15,8 @@
/*
/* Arguments:
/* .IP fd
-/* File descriptor in the range 0..FD_SETSIZE (on systems that
-/* need to use select(2)).
+/* File descriptor. With implementations based on select(),
+/* a best effort is made to handle descriptors >=FD_SETSIZE.
/* .IP timeout
/* If positive, deadline in seconds. A zero value effects a poll.
/* A negative value means wait until something happens.
@@ -62,17 +62,21 @@

int read_wait(int fd, int timeout)
{
-#ifndef USE_SYSV_POLL
+#if defined(NO_SYSV_POLL)
fd_set read_fds;
fd_set except_fds;
struct timeval tv;
struct timeval *tp;
+ int temp_fd = -1;

/*
* Sanity checks.
*/
- if (FD_SETSIZE <= fd)
- msg_panic("descriptor %d does not fit FD_SETSIZE %d", fd, FD_SETSIZE);
+ if (FD_SETSIZE <= fd) {
+ if ((temp_fd = dup(fd)) < 0 || temp_fd >= FD_SETSIZE)
+ msg_fatal("descriptor %d does not fit FD_SETSIZE %d", fd, FD_SETSIZE);
+ fd = temp_fd;
+ }

/*
* Use select() so we do not depend on alarm() and on signal() handlers.
@@ -99,13 +103,17 @@
msg_fatal("select: %m");
continue;
case 0:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
errno = ETIMEDOUT;
return (-1);
default:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (0);
}
}
-#else
+#elif defined(USE_SYSV_POLL)

/*
* System-V poll() is optimal for polling a few descriptors.
@@ -132,5 +140,7 @@
return (0);
}
}
+#else
+#error "define USE_SYSV_POLL or NO_SYSV_POLL"
#endif
}
diff --exclude=man --exclude=html --exclude=README_FILES --exclude=.indent.pro --exclude=Makefile.in -ur /var/tmp/postfix-2.11-20130324/src/util/readable.c src/util/readable.c
--- /var/tmp/postfix-2.11-20130324/src/util/readable.c 2007-02-12 10:41:11.000000000 -0500
+++ src/util/readable.c 2013-03-25 14:50:06.000000000 -0400
@@ -14,7 +14,8 @@
/*
/* Arguments:
/* .IP fd
-/* File descriptor in the range 0..FD_SETSIZE.
+/* File descriptor. With implementations based on select(),
+/* a best effort is made to handle descriptors >=FD_SETSIZE.
/* DIAGNOSTICS
/* All system call errors are fatal.
/* LICENSE
@@ -54,16 +55,20 @@

int readable(int fd)
{
-#ifndef USE_SYSV_POLL
+#if defined(NO_SYSV_POLL)
struct timeval tv;
fd_set read_fds;
fd_set except_fds;
+ int temp_fd = -1;

/*
* Sanity checks.
*/
- if (fd >= FD_SETSIZE)
- msg_fatal("fd %d does not fit in FD_SETSIZE", fd);
+ if (fd >= FD_SETSIZE) {
+ if ((temp_fd = dup(fd)) < 0 || temp_fd >= FD_SETSIZE)
+ msg_fatal("fd %d does not fit in FD_SETSIZE", fd);
+ fd = temp_fd;
+ }

/*
* Initialize.
@@ -85,12 +90,16 @@
msg_fatal("select: %m");
continue;
default:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (FD_ISSET(fd, &read_fds));
case 0:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (0);
}
}
-#else
+#elif defined(USE_SYSV_POLL)

/*
* System-V poll() is optimal for polling a few descriptors.
@@ -115,5 +124,7 @@
return (1);
}
}
+#else
+#error "define USE_SYSV_POLL or NO_SYSV_POLL"
#endif
}
diff --exclude=man --exclude=html --exclude=README_FILES --exclude=.indent.pro --exclude=Makefile.in -ur /var/tmp/postfix-2.11-20130324/src/util/sys_defs.h src/util/sys_defs.h
--- /var/tmp/postfix-2.11-20130324/src/util/sys_defs.h 2012-09-24 19:53:56.000000000 -0400
+++ src/util/sys_defs.h 2013-03-25 14:14:16.000000000 -0400
@@ -242,7 +242,7 @@
#define SOCKOPT_SIZE socklen_t
#ifndef NO_KQUEUE
# define EVENTS_STYLE EVENTS_STYLE_KQUEUE
-# define USE_SYSV_POLL
+# define NO_SYSV_POLL
#endif
#ifndef NO_POSIX_GETPW_R
# define HAVE_POSIX_GETPW_R
@@ -1368,8 +1368,8 @@
#define EVENTS_STYLE_DEVPOLL 3 /* Solaris /dev/poll */
#define EVENTS_STYLE_EPOLL 4 /* Linux epoll */

-#if !defined(USE_SYSV_POLL) && (EVENTS_STYLE != EVENTS_STYLE_SELECT)
-#error "need USE_SYSV_POLL with EVENTS_STYLE != EVENTS_STYLE_SELECT"
+#if !defined(USE_SYSV_POLL) && !defined(NO_SYSV_POLL) && (EVENTS_STYLE != EVENTS_STYLE_SELECT)
+#error "need USE_SYSV_POLL or NO_SYSV_POLL with EVENTS_STYLE != EVENTS_STYLE_SELECT"
#endif

/*
diff --exclude=man --exclude=html --exclude=README_FILES --exclude=.indent.pro --exclude=Makefile.in -ur /var/tmp/postfix-2.11-20130324/src/util/writable.c src/util/writable.c
--- /var/tmp/postfix-2.11-20130324/src/util/writable.c 2007-02-12 11:35:36.000000000 -0500
+++ src/util/writable.c 2013-03-25 14:50:06.000000000 -0400
@@ -14,7 +14,8 @@
/*
/* Arguments:
/* .IP fd
-/* File descriptor in the range 0..FD_SETSIZE.
+/* File descriptor. With implementations based on select(),
+/* a best effort is made to handle descriptors >=FD_SETSIZE.
/* DIAGNOSTICS
/* All system call errors are fatal.
/* LICENSE
@@ -54,16 +55,20 @@

int writable(int fd)
{
-#ifndef USE_SYSV_POLL
+#if defined(NO_SYSV_POLL)
struct timeval tv;
fd_set write_fds;
fd_set except_fds;
+ int temp_fd = -1;

/*
* Sanity checks.
*/
- if (fd >= FD_SETSIZE)
- msg_fatal("fd %d does not fit in FD_SETSIZE", fd);
+ if (fd >= FD_SETSIZE) {
+ if ((temp_fd = dup(fd)) < 0 || temp_fd >= FD_SETSIZE)
+ msg_fatal("fd %d does not fit in FD_SETSIZE", fd);
+ fd = temp_fd;
+ }

/*
* Initialize.
@@ -85,12 +90,16 @@
msg_fatal("select: %m");
continue;
default:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (FD_ISSET(fd, &write_fds));
case 0:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (0);
}
}
-#else
+#elif defined(USE_SYSV_POLL)

/*
* System-V poll() is optimal for polling a few descriptors.
@@ -115,5 +124,7 @@
return (1);
}
}
+#else
+#error "define USE_SYSV_POLL or NO_SYSV_POLL"
#endif
}
diff --exclude=man --exclude=html --exclude=README_FILES --exclude=.indent.pro --exclude=Makefile.in -ur /var/tmp/postfix-2.11-20130324/src/util/write_wait.c src/util/write_wait.c
--- /var/tmp/postfix-2.11-20130324/src/util/write_wait.c 2007-02-12 11:37:40.000000000 -0500
+++ src/util/write_wait.c 2013-03-25 14:50:06.000000000 -0400
@@ -15,7 +15,8 @@
/*
/* Arguments:
/* .IP fd
-/* File descriptor in the range 0..FD_SETSIZE.
+/* File descriptor. With implementations based on select(),
+/* a best effort is made to handle descriptors >=FD_SETSIZE.
/* .IP timeout
/* If positive, deadline in seconds. A zero value effects a poll.
/* A negative value means wait until something happens.
@@ -61,17 +62,21 @@

int write_wait(int fd, int timeout)
{
-#ifndef USE_SYSV_POLL
+#if defined(NO_SYSV_POLL)
fd_set write_fds;
fd_set except_fds;
struct timeval tv;
struct timeval *tp;
+ int temp_fd = -1;

/*
* Sanity checks.
*/
- if (FD_SETSIZE <= fd)
- msg_panic("descriptor %d does not fit FD_SETSIZE %d", fd, FD_SETSIZE);
+ if (FD_SETSIZE <= fd) {
+ if ((temp_fd = dup(fd)) < 0 || temp_fd >= FD_SETSIZE)
+ msg_fatal("descriptor %d does not fit FD_SETSIZE %d", fd, FD_SETSIZE);
+ fd = temp_fd;
+ }

/*
* Guard the write() with select() so we do not depend on alarm() and on
@@ -98,13 +103,17 @@
msg_fatal("select: %m");
continue;
case 0:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
errno = ETIMEDOUT;
return (-1);
default:
+ if (temp_fd != -1)
+ (void) close(temp_fd);
return (0);
}
}
-#else
+#elif defined(USE_SYSV_POLL)

/*
* System-V poll() is optimal for polling a few descriptors.
@@ -131,5 +140,7 @@
return (0);
}
}
+#else
+#error "define USE_SYSV_POLL or NO_SYSV_POLL"
#endif
}
Axel Luttgens
2013-03-27 08:33:33 UTC
Permalink
Post by Wietse Venema
use kqueue() for event handling, and use select() to enforce
read/write time limits.
This code will attempt to handle descriptors >=FD_SETSIZE, by
dup()ing them down if possible. This will in practice eliminate the
FD_SETSIZE limit with Postfix daemons (these reserve some 100 low
file descriptors for third-party libraries that use select()
internally).
Wietse
[...patch code...]
Hello Wietse,

Once again, many thanks for your contribution to this problem.

I still have to carefully read your patch, so as to be sure to have tried to understand everything. ;-)
But I believe to have understood it is kind of an "all or nothing" approach: either poll for everything, or select for everything.

So, if you allow, I have a "theoretical" question.
Let's consider a hypothetical OS capable to poll anything but devices.
On the other hand, the bulk of operations in Postfix seems to be related to sockets, fifos... (but I may be wrong, of course).
Aren't we sacrifying potential efficiency gains by having that OS making use of select exclusively?

Back to our actual OS now.
I suddenly remembered that /dev/random (or, equivalently in this case, /dev/urandom) doesn't need any special treatment: just read the wanted bytes from it, data should be immediately available, and the operation should return immediately.
Should it not behave that way, I guess this would mean the kernel is stuck anyway (I may be wrong here too, of course).

Axel
Wietse Venema
2013-03-27 14:59:30 UTC
Permalink
Post by Wietse Venema
use kqueue() for event handling, and use select() to enforce
read/write time limits.
postfix-2.11-20130327 can use select() after poll() fails. The
result will perform identically to Apple's code, but without using
special-case code for the random-device reader.

To clean up the code further, I eliminated "#ifdef MACOSX" tests.
The poll() and getrlimit() workarounds now have their own feature
macros. This allows me to enable a workaround on other systems if
needed, but the main benefit is that it makes the workarounds easier
to test on a non-MACOS system (no need to edit files).

Wietse
Axel Luttgens
2013-03-30 10:15:39 UTC
Permalink
Hello,

I just noticed I sent my reply from a wrong address, and that it has been silently discarded...
Objet : Rép : PATCH: MacOS X use select() instead of poll()
Date : 27 mars 2013 16:46:48 UTC+01:00
Post by Wietse Venema
[...]
postfix-2.11-20130327 can use select() after poll() fails. The
result will perform identically to Apple's code, but without using
special-case code for the random-device reader.
Wow, that sounds great!
No risk to become incompatible with previous implementations?
After all, I came with that slightly exotic OS and would hate to be the cause of sudden breakdowns all over the world... ;-)
Post by Wietse Venema
To clean up the code further, I eliminated "#ifdef MACOSX" tests.
The poll() and getrlimit() workarounds now have their own feature
macros. This allows me to enable a workaround on other systems if
needed, but the main benefit is that it makes the workarounds easier
to test on a non-MACOS system (no need to edit files).
Happy for that one too, since code maintenance should now be no more complicated than before; I'm feeling a bit less guilty.
BTW, I think I'll immediately jump to the 2.11.x series, since your site describe it as of production quality even if "experimental". I've thus downloaded that postfix-2.11-20130327 snapshot and will report back as soon as possible.
Thank you wholeheartedly,
Axel
So, I'm currently experimenting with the postfix-2.11-20130327 snapshot, and everything seems to be fine:

- no compilation problem
- no hangs, crashes...
- the various tests mentioned in previous threads (including quick TLS tests) have passed with this snapshot too

Sincerely,
Axel
Wietse Venema
2013-03-30 12:13:25 UTC
Permalink
Post by Axel Luttgens
So, I'm currently experimenting with the postfix-2.11-20130327
- no compilation problem
- no hangs, crashes...
- the various tests mentioned in previous threads (including quick
TLS tests) have passed with this snapshot too
That is good news, thanks for testing. These changes are more
invasive that I allow for a stable release (it is not an emergency),
so I will release with Postfix 2.11 and later only.

Wietse

Loading...