yunel | what is pgcl |
---|---|
wli | yunel: It's an abbreviation for "page clustering". It might take a bit more explanation for the idea of what code doing that does and so on. |
gar|IG | wooohoooooo ! |
gar|IG | nice talk wli, but pretty difficult for the "standard" user |
wli | gar|IG: Hmm. I was trying to be as accessible as possible, since I hadn't ever given an explanation of the idea and/or concept. |
wli | gar|IG: I have trouble doing that in general, though. |
gar|IG | wli: that's no problem :) it's just not the kind of literature that is easily understandable |
gar|IG | wli: the stuff about the pagefaulting is still kinda fuzzy to me. |
wli | gar|IG: Okay, is it the faultahead benefit, or the way it fills things in to prevent memory waste? |
gar|IG | wli: the way it fills in things. i don't have a clear understanding about what happens exactly |
wli | gar|IG: Aha! I handwaved for a reason because it's complex. |
wli | gar|IG: Okay, basically you _need_ to allocate something that's of size MMUPAGE_SIZE to get a valid pte, right? |
wli | MJesus: I'm not sure; I sort of work in programming and so on and don't follow global policy decisions and so on. |
gar|IG | wli: pte is a "page table entry" right ? if so, that part i understand |
wli | gar|IG: Well, the trouble you get into is that you only need MMUPAGE_SIZE (say, 4KB), and when you do alloc_page() you actually get PAGE_SIZE (say, 64KB). |
wli | gar|IG: for file-backed memory there is no trouble at all: you can just look it up in the pagecache, right? |
wli | gar|IG: for anonymous memory, there isn't anywhere to look it up, so if you don't use it all (or set up a way to look it up), you waste PAGE_SIZE - MMUPAGE_SIZE (== 60KB in that example). |
gar|IG | wli: okay, that's logical imho |
wli | gar|IG: the basic idea of what goes on is that you scan around looking for ptes you can also fill in at the same time. |
wli | gar|IG: You try to find enough of them so you can use up the whole PAGE_SIZE (for 4KB MMUPAGE_SIZE and 64KB PAGE_SIZE you would need to find 16). |
wli | And the particular way it does this is hairy. |
wli | gar|IG: Did that help any? |
gar|IG | wli: yep.. i understand |
wli | great |
gar|IG | wli: i'm interested in those things, is there some documentation about it online ? or is it just reading the source ? |
wli | gar|IG: This also sort of leads to the reason why the MMUPAGE_SIZE -aligned mmap() still works: all you have to do, say, for file-backed memory, is not fill in ptes not covered by the vma. |
wli | gar|IG: There isn't any documentation; there aren't even papers on it. |
wli | ftp://ftp.veritas.com/linux/ is the original; ftp://ftp.kernel.org/pub/linux/kernel/people/wli/vm/pgcl/ is |
gar|IG | let me bookmark those |
wli | gar|IG: For anonymous memory, there is no file offset -based calculation, so you can (in theory) sprinkle the pieces anywhere an anonymous memory could be mapped, though the current fault handlers don't try to do anything like that. |
wli | gar|IG: There is also no _necessary_ distinction between anonymous pages that are COW copies of things and anonymous pages that are zeroed out, but the current fault handlers do that also. |
clsk | Would it be possible to have a sort of procedure in the kernel that keeps track of all the memory requested by userspace programs so that when someone writes past where they are supposed to touch the kernel falgs an error instead of creating a chaos in the OS? and if it is possible is it somewhere in a TODO list for the linux kernel? |
wli | clsk: This is already done. The fault handlers all check to see if it was a fault from userspace and if the fault was on a kernel address, and send userspace signals instead of faulting in the memory. |
gar|IG | wli: before i understand all those things i think i need to understand more about COW and fault handling |
wli | gar|IG: Well, there's not too much depth needed for what I was explaining. If I can take a quick shot at it: |
wli | gar|IG: Anonymous pages aren't in lookup structures associated with files. There are two ways to get anonymous memory. The first is anonymous memory that's zeroed out, and the second anonymous memory that's a copy of something. The code that tries to find ptes checks for the difference and doesn't try to mix the two when it's finding ptes that could use more pieces of the page. A smarter fault handler could use both kinds to use up the pieces of the page. |
gar|IG | aha, i see |
wli | gar|IG: Here's where it gets interesting. I already did that. But I had trouble with implementing a very basic performance optimization in combination with that. That optimization is noticing when you are the only user of an anonymous page when you fault on it, and instead of allocating a new one and copying, assuming ownership and remapping it read-write. |
wli | gar|IG: Because of that performance problem, I spent about 3 months rewriting all the fault handling to work the same way as the 2.4 version apart from rather difficult problems supporting rmap and highpte. |
wli | gar|IG: It is. It's also very difficult because a large number of drivers and filesystems need to be updated to keep them working. |
gar|IG | wli: i see... |
wli | It's also very difficult to keep it up-to-date because even without driver code or filesystem code being touched it's something like a 200KB-300KB patch. I had 40 rejects the last time I moved between releases. |
gar|IG | ouch ouch...and you're the only one who writes code for it ? |
wli | Zwane Mwaikambo is a regular contributor, though I appear to absorb the vast majority of the programming work, and he's the only other person who sends in code for it. |
gar|IG | hmmzzz, there are 3 people around me that want to get some food and they are trying to drag me with them. |
wli | Okay, thanks for listening. |
EMPE[log] | thanxs to you wli |
krocz | thanks to you |
wli | Anytime. I'll be around for a bit more just in case. |
ducky | thanks wli |
gar|IG | wli: cool :-) btw, i have translated your talk to dutch, so you've just become international |
wli | Excellent! |
MJesus | wli thanks you VERY MUCH!!!! |
MJesus | clap clap clap clap clap clap clap clap clap clap |
MJesus | clap clap clap clap clap clap clap clap clap clap |
MJesus | clap clap clap clap clap clap clap clap clap clap |
MJesus | clap clap clap clap clap clap clap clap clap clap |