>I was waiting to see more discussion on this, since I have no
>first-hand experience, but as far as I know, *BSD currently
>implement Posix-style threads only in user-land (application-space)
>whereas Linux implements them in kernel space.

Right.  BSD uses a userland thread library to schedule threads, while
Linux' threads are scheduled just like normal "processes" in the
kernel.

This means that, for BSD to properly handle threads, async I/O must
be used and handled by the thread library, otherwise when a single
thread blocks for I/O, the whole process blocks, and no other work
may be done until the I/O request is complete.  But, properly
managing async I/O from multiple callers is kinda hard (at least to
do it efficiently).

Linux, on the other hand, with the kernel-thread model, assigns a
kernel "task" to each executing thread in a program, and simply
allows the threads to share the vm map, fd table, etc. among them-
selves (the clone() syscall under Linux handles this, taking diff-
erent flags for emulation of fork(), vfork(), etc.).  Threaded
programs under Linux, then, can use regular blocking I/O, managed
by the kernel, to do what has to be emulated on other systems in
the thread libraries.

Solaris (used to?) uses a hybrid method whereby multiple
userland-scheduled threads are bound to a kernel-level "lightweight
process", of which there can be more than one designated a given
"normal process".

Basically, Solaris' threads are (were?) run on a two-tiered scheduler,
with the system kernel handling I/O for a group of threads running in
a process.  I'm pretty sure the basic I/O primitives were wrapped in
their thread library (like they are in BSD) and made use of async
I/O, those signals being caught and handled by the thread scheduler
in userland to wake up threads blocked for I/O when a request was
done.  Multiple paths of I/O, happening on multiple LWPs (scheduled
on possibly multiple CPUs, possibly completely concurrently) were
therefore possible.  Not sure how big a difference (or penalty) multi-
threads makes on a single CPU system, though, in any of the models.

Irix probably follows a similar model to Solaris, but I don't know
for sure.

AFAIK, Digital has a decent implementation of pthreads, at least.  One
of their guys, Dave Butenhof, is one of the main architects of the
standard.  I think, though, that Digital's thread package operates
like that of BSD, completely in userland.

Just as a follow-on question, does anyone know the current status
of threads in Solaris?  They had a good, solid thread library before
pthreads was standardized which included the ability to force another
thread to sleep (pthreads, AFAIK, includes no such functionality).
How difficult would it be to implement a mezzanine-scheduled thread
package for OpenBSD, given that there is not yet support for MP?
Would it even be worth doing on a single CPU?

What I've been wondering about is, having the main thread of a
program (the one active when main() is called) register an entry
point with the kernel, much like a signal handler.  When that
process is being switched to by the system's scheduler, it will
make a check for that entry point (skip if NULL), and will enter
that code so that the userland thread library can perform any
cleanups, etc. that it needs to accomplish before the actual program
code is entered again.

This entry code would run in user mode, obviously, and would be
scheduled like normal by the kernel (except that, if the main thread
was preempted by the kernel while executing that code, there would
have to be some sort of flag set on the process and all its peers so
that none of them could be run until the thread scheduler/library
entry point had returned normally).  Upon exit from that handler, the
process' normal code could be set in motion for the balance of that
thread's time quantum.

Children of the main thread (in a kernel-thread/user-thread hybrid
scheme) would not be able to register a handler, then.  Such a system
might make possible some interesting GC systems in userland code, too.

--Corey