particularly help programs which load many shared libraries with
a lot of relocations. Large C++ programs such as are found in KDE
are a prime example.
While relocating a shared object, maintain a vector of symbols
which have already been looked up, directly indexed by symbol
number. Typically, symbols which are referenced by a relocation
entry are referenced by many of them. This is the same optimization
I made to the a.out dynamic linker in 1995 (rtld.c revision 1.30).
Also, compare the first character of a sought-after symbol with its
symbol table entry before calling strcmp().
On a PII/400 these changes reduce the start-up time of a typical
KDE program from 833 msec (elapsed) to 370 msec.
MFC after: 5 days
longer includes machine/elf.h.
* consumers of elf.h now use the minimalist elf header possible.
This change is motivated by Binutils 2.11.0 and too much clashing over
our base elf headers and the Binutils elf headers.
Formerly the init functions were called in the opposite of the
order in which libraries were loaded, and libraries were loaded
according to a breadth-first traversal of the dependency graph.
That ordering came from SVR4.0, and it was easy to implement but
not always sensible.
Now we do a depth-first walk over the dependency graph and call
the init functions in an order such that each shared object's needed
objects are initialized before the shared object itself. At the
same time we build a list of finalization (fini) functions in the
opposite order, to guarantee correct C++ destructor ordering whenever
possible. (It may not be possible if dlopen and dlclose are used
in strange ways, but we come as close as one can come.)
The need for this renovation has become apparent as more programs
have started using multithreading. The multithreaded C library
libc_r requires initialization, whereas the standard libc does not.
Since virtually every other object depends on the C library, it is
important that it get initialized first.
and for all (I hope). Packages such as wine, JDK, and linuxthreads
should no longer have any problems with re-entering the dynamic
linker.
This commit replaces the locking used in the dynamic linker with a
new spinlock-based reader/writer lock implementation. Brian
Fundakowski Feldman <green> argued for this from the very beginning,
but it took me a long time to come around to his point of view.
Spinlocks are the only kinds of locks that work with all thread
packages. But on uniprocessor systems they can be inefficient,
because while a contender for the lock is spinning the holder of the
lock cannot make any progress toward releasing it. To alleviate
this disadvantage I have borrowed a trick from Sleepycat's Berkeley
DB implementation. When spinning for a lock, the requester does a
nanosleep() call for 1 usec. each time around the loop. This will
generally yield the CPU to other threads, allowing the lock holder
to finish its business and release the lock. I chose 1 usec. as the
minimum sleep which would with reasonable certainty not be rounded
down to 0.
The formerly machine-independent file "lockdflt.c" has been moved
into the architecture-specific subdirectories by repository copy.
It now contains the machine-dependent spinlocking code. For the
spinlocks I used the very nifty "simple, non-scalable reader-preference
lock" which I found at
<http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html>
on all CPUs except the 80386 (the specific CPU model, not the
architecture). The 80386 CPU doesn't support the necessary "cmpxchg"
instruction, so on that CPU a simple exclusive test-and-set lock
is used instead. 80386 CPUs are detected at initialization time by
trying to execute "cmpxchg" and catching the resulting SIGILL
signal.
To reduce contention for the locks, I have revamped a couple of
key data structures, permitting all common operations to be done
under non-exclusive (reader) locking. The only operations that
require exclusive locking now are the rare intrusive operations
such as dlopen() and dlclose().
The dllockinit() interface is now deprecated. It still exists,
but only as a do-nothing stub. I plan to remove it as soon as is
reasonably possible. (From the very beginning it was clearly
labeled as experimental and subject to change.) As far as I know,
only the linuxthreads port uses dllockinit(). This interface turned
out to have several problems. As one example, when the dynamic
linker called a client-supplied locking function, that function
sometimes needed lazy binding, causing re-entry into the dynamic
linker and a big looping mess. And in any case, it turned out to be
too burdensome to require threads packages to register themselves
with the dynamic linker.
figure out which shared object(s) contain the the locking methods
and fully bind those objects as if they had been loaded with
LD_BIND_NOW=1. The goal is to keep the locking methods from
requiring any lazy binding. Otherwise infinite recursion occurs
in _rtld_bind.
This fixes the infinite recursion problem in the linuxthreads port.
init and fini functions. Now the code is very careful to hold no
locks when calling these functions. Thus the dynamic linker cannot
be re-entered with a lock already held.
Remove the tolerance for recursive locking that I added in revision
1.2 of dllockinit.c. Recursive locking shouldn't happen any more.
Mozilla and JDK users: I'd appreciate confirmation that things still
work right (or at least the same) with these changes.
functions to be used by the dynamic linker. This can be called by
threads packages at start-up time. I will add the call to libc_r
soon.
Also add a default locking method that is used up until dllockinit()
is called. The default method works by blocking SIGVTALRM, SIGPROF,
and SIGALRM in critical sections. It is based on the observation
that most user-space threads packages implement thread preemption
with one of these signals (usually SIGVTALRM).
The dynamic linker has never been reentrant, but it became less
reentrant in revision 1.34 of "src/libexec/rtld-elf/rtld.c".
Starting with that revision, multiple threads each doing lazy
binding could interfere with each other. The usual symptom was
that a symbol was falsely reported as undefined at start-up time.
It was rare but not unseen. This commit fixes it.
libjava peeks into the dynamic linker's private Obj_Entry structures.
My recent changes introduced some new members near the front of
the structures, causing libjava to get the wrong fields. This commit
moves the new members toward the end of the structure so that the
layout of the portion that is relevant to JDK remains the same as
before.
I will work with the JDK porting team to see if we can come up with
a less fragile way for them to do what they need to do. I understand
the current approach was necessary in order to work around some
limitations of the dynamic linker. Maybe it's not necessary any
more.
PT_INTERP program header entry, to ensure that gdb always finds
the right dynamic linker.
Use obj->relocbase to simplify a few calculations where appropriate.
loaded separately by dlopen that have global symbols with identical
names. Viewing each dlopened object as a DAG which is linked by its
DT_NEEDED entries in the dynamic table, the search order is as
follows:
* If the referencing object was linked with -Bsymbolic, search it
internally.
* Search all dlopened DAGs containing the referencing object.
* Search all objects loaded at program start up.
* Search all objects which were dlopened() using the RTLD_GLOBAL
flag (which is now supported too).
The search terminates as soon as a strong definition is found.
Lacking that, the first weak definition is used.
These rules match those of Solaris, as best I could determine them
from its vague manual pages and the results of experiments I performed.
PR: misc/12438
the Makefile, and move it down into the architecture-specific
subdirectories.
Eliminate an asm() statement for the i386.
Make the dynamic linker work if it is built as an executable instead
of as a shared library. See i386/Makefile.inc to find out how to
do it. Note, this change is not enabled and it might never be
enabled. But it might be useful in the future. Building the
dynamic linker as an executable should make it start up faster,
because it won't have any relocations. But in practice I suspect
the difference is negligible.
or Elf64 based on the inclusion of the machine dependent header.
I've left the addition of the extra fields to handle the relocation
structures with addend for a separate commit after jdp has had a chance
to review what I've done. The current change is needed to compile
csu/alpha/crt1.c
quite a few enhancements and bug fixes. There are still some known
deficiencies, but it should be adequate to get us started with ELF.
Submitted by: John Polstra <jdp@polstra.com>