3rd release of v4
Rewrite the mutex implementation for better performance.

    This mutex reimplementation attempts to optimise for the common case:
    default mutex type and no contention. Allocation of expensive
    resources (heap memory and Windows kernel objects) is delayed until
    needed, which may be never if the lock isn't subsequently used or
    contended.

    The global spinlock of the old implementation is gone; performance is
    orders of magnitude better and scales nicely with multiple
    threads. This has been confirmed both in micro-benchmarks and actual
    applications.

    The reimplementation is fully binary compatible. It does not do nearly
    as much error-checking as the old one, but should be strictly within
    the limits of what Posix allows.

    Code in thread.c relied on undefined behaviour (pthread_mutex_destroy
    called twice on the same mutex) so an explicit reinitialisation had to
    be added.

(cherry picked from commit 1968e60cd5d59727bb325d5b69c8f0d7a2f1fe1b)
7 files changed