When learning more about how operating systems and the hardware they run on work and interact with each other, you may be surprised to see what appears to be oddities or under-utilization of “resources” occurring. Why is that? Today’s SuperUser Q&A post has the answer to a curious reader’s question.
Today’s Question & Answer session comes to us courtesy of SuperUser—a subdivision of Stack Exchange, a community-driven grouping of Q&A web sites.
The Question
SuperUser reader AdHominem wants to know why x86 CPUs only use two out of four rings:
Linux and Windows based x86 systems only use Ring 0 for kernel mode and Ring 3 for user mode. Why do processors even distinguish four different rings if they all end up just using two of them anyway? Has this changed with the AMD64 architecture?
Why do x86 CPUs only use two out of four rings?
The Answer
SuperUser contributor Jamie Hanrahan has the answer for us:
There are two primary reasons.
The first is that, although the x86 CPUs do offer four rings of memory protection, the granularity of protection offered thereby is only at the per-segment level. That is, each segment can be set to a specific ring (privilege level) along with other protections like write-disabled. But there are not that many segment descriptors available. Most operating systems would like to have a much finer granularity of memory protection, like… for individual pages.
So, enter page table-based protection. Most, if not all, modern x86 operating systems more or less ignore the segmenting mechanism (as much as they can anyway) and rely on the protection available from the low-order bits in page table entries. One of these is called the “privileged” bit. This bit controls whether or not the processor has to be in one of the “privileged” levels to access the page. The “privileged” levels are PL 0, 1, and 2. But it is just one bit, so at the page-by-page protection level, the number of “modes” available as far as memory protection is concerned is just two: A page can be accessible from non-privileged mode, or not. Hence, just two rings. To have four possible rings for each page, they would have to have two protection bits in each page table entry to encode one of four possible ring numbers (just as do the segment descriptors). However, they do not.
The other reason is a desire for operating system portability. It is not just about x86; Unix taught us that an operating system could be relatively portable to multiple processor architectures, and that it was a good thing. And some processors support only two rings. By not depending on multiple rings in the architecture, the operating system implementers made the operating systems more portable.
There is a third reason that is specific to Windows NT development. NT’s designers (David Cutler and his team, whom Microsoft hired away from DEC Western Region Labs) had extensive previous experience on VMS; in fact, Cutler and a few of the others were among VMS’s original designers. And the VAX processor for which VMS was designed does have four rings (VMS uses four rings).
But the components that ran in VMS’s Rings 1 and 2 (Record Management Services and the CLI, respectively) were left out of the NT design. Ring 2 in VMS was not really about operating system security, but rather about preserving the user’s CLI environment from one program to the next, and Windows did not have that concept; the CLI runs as an ordinary process. As for VMS’s Ring 1, the RMS code in Ring 1 had to call into Ring 0 fairly often, and ring transitions are expensive. It turned out to be far more efficient to just go to Ring 0 and be done with it rather than have a lot of Ring 0 transitions within the Ring 1 code (again, not that NT has anything like RMS anyway).
As for why x86 implemented four rings while operating systems did not use them, you are talking about operating systems of far more recent design than x86. A lot of the system programming features of x86 were designed long before NT or true Unix-ish kernels were implemented on it, and they did not really know what the operating system would use. It was not until we got paging on x86 that we could implement true Unix-ish or VMS-like kernels.
Not only do modern x86 operating systems largely ignore segmenting (they just set up the C, D, and S segments with a base address of 0 and size of 4 GB; F and G segments are sometimes used to point to key operating system data structures), they also largely ignore things like “task state segments”. The TSS mechanism was clearly designed for thread context switching, but it turns out to have too many side effects, so modern x86 operating systems do it “by hand”. The only time x86 NT changes hardware tasks is for some truly exceptional conditions, like a double fault exception.
Regarding x64 architecture, a lot of these disused features were left out. To their credit, AMD actually talked to operating system kernel teams and asked what they needed from x86, what they did not need or did not want, and what they would like added. Segments on x64 exist only in what might be called vestigial form, task state switching does not exist, etc., and operating systems continue to use just two rings.
Have something to add to the explanation? Sound off in the comments. Want to read more answers from other tech-savvy Stack Exchange users? Check out the full discussion thread here.
0 comments:
Post a Comment