Discussion:
Support for OpenLDAP MDB library
(too old to reply)
Howard Chu
2012-09-20 22:29:20 UTC
Permalink
The attached patches add support for OpenLDAP's MDB (Memory-Mapped Database)
library to the postfix-2.10-20120908 source.

I don't consider this a final patch; pretty sure it works but also would like
feedback on how some things were done. Also, I didn't see any instructions
anywhere on how to write the corresponding _README file, it kind of looks like
nroff output. manpage updates also need to be written.

The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."

I think this was a necessary feature for the dict interface in general;
arbitrary apps shouldn't need to know anything about how a particular table
implementation handles locking. In particular, dict implementations ought to
be free to use semaphores, process-shared mutexes, or anything else that comes
in handy.

I'm happy to write up the corresponding doc updates, when someone tells me
where to begin. In the meantime, you can read up on our MDB library here:
http://highlandsun.com/hyc/mdb/
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Wietse Venema
2012-09-20 23:29:48 UTC
Permalink
Post by Howard Chu
The attached patches add support for OpenLDAP's MDB (Memory-Mapped Database)
library to the postfix-2.10-20120908 source.
I don't consider this a final patch; pretty sure it works but also would like
feedback on how some things were done. Also, I didn't see any instructions
anywhere on how to write the corresponding _README file, it kind of looks like
nroff output. manpage updates also need to be written.
The README source files are in the "proto" subdirectory. They are
transformed into hyperlinked HTML and ASCII text, under control of
appropriate Makefiles. The embedded manpages in source files are
managed separately.
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
In the case of Postfix, most daemon applications are readers, while
postmap and postalias are writers. The readers use per-query locks,
while the writers use a global lock while they rewrite the table
from scratch.

In the case of DBM (which consists of two files) that is the best
one can do. In the case of CDB and (soon) Berkeley DB it's better
to rename the file into place after rewriting it from scratch.
Post by Howard Chu
I think this was a necessary feature for the dict interface in general;
arbitrary apps shouldn't need to know anything about how a particular table
implementation handles locking. In particular, dict implementations ought to
be free to use semaphores, process-shared mutexes, or anything else that comes
in handy.
As explained above, one size does not fit all, different apps have
different requirements.

I'll have a look at the lock handler code and see if it will cause
any new problems with the existing maps.

Wietse
Post by Howard Chu
I'm happy to write up the corresponding doc updates, when someone tells me
http://highlandsun.com/hyc/mdb/
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
[ Attachment, skipping... ]

[ Attachment, skipping... ]
Howard Chu
2012-09-21 11:18:00 UTC
Permalink
Post by Wietse Venema
In the case of Postfix, most daemon applications are readers, while
postmap and postalias are writers. The readers use per-query locks,
while the writers use a global lock while they rewrite the table
from scratch.
In the case of DBM (which consists of two files) that is the best
one can do. In the case of CDB and (soon) Berkeley DB it's better
to rename the file into place after rewriting it from scratch.
Ah, I totally overlooked the handling of O_TRUNC in my dict_mdb.c, will send
an updated patch shortly.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Wietse Venema
2012-09-20 23:56:21 UTC
Permalink
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
That change is unnecessary. All you need is a way to tell the caller
that it must not perform explicit lock management.

You can satisfy that need that without damage to existing code,
simply by introducing a new flag that the caller can test.

I already have flags with which a map can announce its properties.
For example, if the map can be used for security-sensitive data,
or if it matches the lookup keys against fixed strings or against
patterns.

You're welcome to add another flag that says "I can take care of
locks if I need to, thank you very much".

Wietse
Howard Chu
2012-09-21 07:31:48 UTC
Permalink
Post by Wietse Venema
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
That change is unnecessary. All you need is a way to tell the caller
that it must not perform explicit lock management.
You can satisfy that need that without damage to existing code,
simply by introducing a new flag that the caller can test.
Such a change strikes me as being a greater code maintenance burden in the
future, as every caller to the dict functions must check your proposed flag
explicitly.

As you said in your previous reply, one size does not fit all. But your
existing code assumes that there is only one type of locking for all dict types.

It doesn't make sense for code like tls/tls_scache.c to assume that just
because dict->lock_fd hasn't been set, that no form of locking is supported.
Indeed, exposing internal details of how a dict module handles locks to its
callers is more of an abstraction layer violation than a feature.
Post by Wietse Venema
I already have flags with which a map can announce its properties.
For example, if the map can be used for security-sensitive data,
or if it matches the lookup keys against fixed strings or against
patterns.
You're welcome to add another flag that says "I can take care of
locks if I need to, thank you very much".
I can certainly do this if that's really the way you want to go. But I'd ask
you to reconsider...
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Wietse Venema
2012-09-21 10:46:46 UTC
Permalink
Post by Wietse Venema
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
That change is unnecessary. All you need is a way to tell the caller
that it must not perform explicit lock management.
You can satisfy that need that without damage to existing code,
simply by introducing a new flag that the caller can test.
I already have flags with which a map can announce its properties.
For example, if the map can be used for security-sensitive data,
or if it matches the lookup keys against fixed strings or against
patterns.
You're welcome to add another flag that says "I can take care of
locks if I need to, thank you very much".
On second consideration, this may also help with memcache deployment.
I'll investigate further - changes in table infrastructure can
affect a lot of Postfix.

Wietse
Wietse Venema
2012-09-24 16:30:30 UTC
Permalink
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
I have added a lock function pointer to the dictionary data structure,
so that a specific dictionary can override it if needed.

This locking function is not exposed to applications: it is used
only by dict_open(). Applications rely on the dict_open()
"lock-on-open" feature for maps that aren't multi-writer safe.

A multi-writer safe table like MDB could override this function
with a NOOP (or it could leave the lock_fd member equal to -1 and
achieve the exact same result).

In the case of postmap/postalias which truncate databases, a global
lock is needed to prevent readers from looking at an incomplete
database. This has nothing to do with the dict_open() "lock-on-open"
feature for tables that aren't multi-writer safe.

Wietse
Howard Chu
2012-10-09 15:59:09 UTC
Permalink
Post by Wietse Venema
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
I have added a lock function pointer to the dictionary data structure,
so that a specific dictionary can override it if needed.
This locking function is not exposed to applications: it is used
only by dict_open(). Applications rely on the dict_open()
"lock-on-open" feature for maps that aren't multi-writer safe.
A multi-writer safe table like MDB could override this function
with a NOOP (or it could leave the lock_fd member equal to -1 and
achieve the exact same result).
In the case of postmap/postalias which truncate databases, a global
lock is needed to prevent readers from looking at an incomplete
database. This has nothing to do with the dict_open() "lock-on-open"
feature for tables that aren't multi-writer safe.
OK, this is an updated MDB patch, with a bit of documentation added as well.
Based on the postfix-2.10-20121007 source.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu
2012-10-09 16:09:00 UTC
Permalink
Post by Howard Chu
Post by Wietse Venema
Post by Howard Chu
The first patch adds a lock handler to the dict interface. I needed this
first, because MDB is fully transactional and does its own locking/concurrency
management. In particular, it does MVCC so readers always run lockless; since
they're fully isolated there's no actual need for "shared locks."
I have added a lock function pointer to the dictionary data structure,
so that a specific dictionary can override it if needed.
This locking function is not exposed to applications: it is used
only by dict_open(). Applications rely on the dict_open()
"lock-on-open" feature for maps that aren't multi-writer safe.
A multi-writer safe table like MDB could override this function
with a NOOP (or it could leave the lock_fd member equal to -1 and
achieve the exact same result).
In the case of postmap/postalias which truncate databases, a global
lock is needed to prevent readers from looking at an incomplete
database. This has nothing to do with the dict_open() "lock-on-open"
feature for tables that aren't multi-writer safe.
OK, this is an updated MDB patch, with a bit of documentation added as well.
Based on the postfix-2.10-20121007 source.
Of course I screwed up the update. Needs this addition:

diff --git a/src/global/mkmap_mdb.c b/src/global/mkmap_mdb.c
index d55c7a7..c22eecf 100644
--- a/src/global/mkmap_mdb.c
+++ b/src/global/mkmap_mdb.c
@@ -82,6 +82,8 @@ MKMAP *mkmap_mdb_open(const char *path)
* Fill in the generic members.
*/
mkmap->open = dict_mdb_open;
+ mkmap->after_open = 0;
+ mkmap->after_close = 0;

/*
* MDB uses MVCC so it needs no special lock management here.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu
2012-10-09 22:16:37 UTC
Permalink
One more...
Post by Howard Chu
Post by Howard Chu
OK, this is an updated MDB patch, with a bit of documentation added as well.
Based on the postfix-2.10-20121007 source.
diff --git a/src/global/mkmap_mdb.c b/src/global/mkmap_mdb.c
index d55c7a7..c22eecf 100644
--- a/src/global/mkmap_mdb.c
+++ b/src/global/mkmap_mdb.c
@@ -82,6 +82,8 @@ MKMAP *mkmap_mdb_open(const char *path)
* Fill in the generic members.
*/
mkmap->open = dict_mdb_open;
+ mkmap->after_open = 0;
+ mkmap->after_close = 0;
/*
* MDB uses MVCC so it needs no special lock management here.
My apologies for sending an inadequately tested patch. This is also needed for
read-only access/enumeration to work:

diff --git a/src/util/dict_mdb.c b/src/util/dict_mdb.c
index fea9af5..b9d4e4a 100644
--- a/src/util/dict_mdb.c
+++ b/src/util/dict_mdb.c
@@ -460,7 +460,7 @@ DICT *dict_mdb_open(const char *path, int open_flags,
int dict_flags)
if ((status = mdb_env_open(env, mdb_path, env_flags, 0644)))
msg_fatal("env_open %s: %s", mdb_path, mdb_strerror(status));

- if ((status = mdb_txn_begin(env, NULL, 0, &txn)))
+ if ((status = mdb_txn_begin(env, NULL, env_flags & MDB_RDONLY, &txn)))
msg_fatal("txn_begin %s: %s", mdb_path, mdb_strerror(status));

/* mdb_open requires a txn, but since the default DB always exists
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu
2012-10-10 01:04:42 UTC
Permalink
Post by Howard Chu
My apologies for sending an inadequately tested patch. This is also needed for
I suggest letting the code cool down for 24 hours. I used to have
code freezes of a week in TCP Wrapper days. A week is unfortunately
no longer an option as things evolve at internet speeds. But 24
hours is still doable.
Looks like I still missed a few places in the docs. Need to add a description
of mdb_map_size to proto/postconf.proto and add MDB_README to the
proto/Makefile.in.
I'm also wondering if it's necessary to provide a config keyword for MDB's
max_readers parameter. The default is 126. If there can be more than 126
threads (and/or processes) reading concurrently from a single table, this
needs to be settable.
If each Postfix process counts as one reader, then a limit of 128
won't be sufficient. The default_process_limit value is 100. That
means there can be 100 smtpd processes and 100 cleanup processes
OK, reposting the entire patch set with this taken into account.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Loading...