Wednesday, February 27, 2008

More OpenLDAP

I tested OpenLDAP multimaster replication some more today. I went back to the official relase version 2.4.8. I ran my test on an AMD 64 bit machine with both servers on the same box--different ports and different installation folders. Things ran fine for the most part. Synchronization was consistent in loading and removing 5000 entries like this:

dn: cn=Fred XX_0,dc=mgm,dc=com
objectClass: inetOrgPerson
objectClass: top
givenName: Fred
sn: XX
cn: Fred XX_0

I don't know much about the berkeley db, but my issues seemed to happen after abrupt shutdowns and what I think are database corruptions. slapd cored on each server in the same place at different times during the testing:

Server 1 Core:

#0 is_ad_subtype (sub=0x0, super=0x832430) at ad.c:489
#1 0x00000000004213a7 in attrs_find (a=0x936998, desc=0x832430) at attr.c:647
#2 0x00000000004352bf in test_ava_filter (op=0x41801640, e=0x914a18, ava=0x41800fb0, type=163) at filterentry.c:617
#3 0x0000000000435761 in test_filter (op=0x41801640, e=0x914a18, f=0x41800fd0) at filterentry.c:88
#4 0x0000000000480d9b in bdb_search (op=0x41801640, rs=0x41800ec0) at search.c:845
#5 0x0000000000470be2 in overlay_op_walk (op=0x41801640, rs=0x41800ec0, which=op_search, oi=0x87b2b0, on=0x0) at backover.c:653
#6 0x00000000004710d5 in over_op_func (op=0x41801640, rs=0x41800ec0, which=op_search) at backover.c:705
#7 0x000000000046a56e in syncrepl_entry (si=0x8a31b0, op=0x41801640, entry=0x90add8, modlist=0x418015a8, syncstate=1,
syncUUID=, syncCSN=0x0) at syncrepl.c:1989
#8 0x000000000046c395 in do_syncrep2 (op=0x41801640, si=0x8a31b0) at syncrepl.c:844
#9 0x000000000046de8c in do_syncrepl (ctx=0x41801df0, arg=) at syncrepl.c:1226
#10 0x000000000041a692 in connection_read_thread (ctx=0x41801df0, argv=) at connection.c:1213
#11 0x00000000004fa6f4 in ldap_int_thread_pool_wrapper (xpool=0x83c890) at tpool.c:625
#12 0x00000035f1e06407 in start_thread () from /lib64/
#13 0x00000035f12d4b0d in clone () from /lib64/

After this happened, a restart of slapd would always hang on either server with this stack trace:

#0 0x00000035f1e076dd in pthread_join () from /lib64/
#1 0x00000000004e4501 in syncprov_db_open (be=0x8a27c0, cr=) at syncprov.c:2632
#2 0x0000000000470868 in over_db_func (be=0x8a27c0, cr=0x7fffad390610, which=) at backover.c:62
#3 0x0000000000427350 in backend_startup_one (be=0x8a27c0, cr=0x7fffad390610) at backend.c:224
#4 0x000000000042761a in backend_startup (be=0x8a27c0) at backend.c:316
#5 0x000000000040505a in main (argc=4, argv=0x7fffad3908d8) at main.c:932

I'm happy to see Gavin's interest in my blog. It shows his dedication--a tribute to the OpenSource community.


Gavin Henry said...

Did you file an ITS ( We need that you know ;-)

I find your posts and others via serveral feeds. Yours came up via:

Kind Regards,

Gavin Henry.
OpenLDAP Engineering Team.


Community developed LDAP software.

Michael Martin said...

Hi Gavin,

Thanks for you interest in my blog.

I tested OpenLDAP 2.4.9, and I no longer see the stability issues with the slapd process I was seeing in 2.4.8. I talk about it more here: openldap-249-multi-master-replication

Gavin Henry said...

Excellent news.