linux14-alan.txt

*** riel sets mode: +m * riel sets the channel to moderated riel I guess it's better if we let Alan have a few minutes of rest before the talk riel OK, welcome to this UMEET lecture riel today Alan Cox will hold a talk titled riel "Optimising for modern processors" riel in order to keep this channel readable, we have set it +m riel so only the ops can talk riel if you have a question during the lecture, you can always ask it in #qc riel you probably already know Alan Cox, who is one of the driving forces behind Linux kernel development riel he is a man of many talents though riel in fact, he even prepared slides for this talk riel you can find those on: riel http://www.linux.org.uk/~alan/Slides/ riel Alan, go ahead when you're ready Alan Ok Alan This talk is partly about how modern processors work Alan Mostly however its about why this changes the way you need to program to get the best performance Alan By modern processors we really mean anything from the pentium onwards - in some ways from the 486 onwards Alan Ten years ago a 40MHz processor was pretty fast. Today the same position is occupied by a 3GHz processor Alan Memory has not increased speed to cope with this, and more importantly it has not improved in latency (the time from asking for a piece of memory to getting it) much at all Alan The new processors also can execute multiple instructions each clock cycle, so in fact the processor might want to be accessing memory not 100 times faster as you might think from the clock rate change but near 500 times faster Alan To deal with this processors added cache memory. It is possible to build systems which just have very fast memory but its incredibly expensive Alan The sort of computer you find on your desktop today has a vry slow memory subsystem - things like 133MHz SDRAM and DDR ram have improved the data rate but not enough, and have done little to improve the access time for a given piece of data Alan To give you an idea how slow main memory is compared with the cache I measured the copying speed of data in the on processor cache (called the L1 or level 1 cache) and the larger slower cache (The L2 or level 2 cache) Alan On an Athlon the L2 cache was six times slower for copying than the L1 cache Alan Main memory is eight times slower than the L2 cache Alan So every piece of data you have to fetch from main memory you could have fetched fifty from the cache Alan This makes keeping the right things in the cache extremely important, as well as knowing how the cache works so that you can understand what is needed to get the best use from it Alan In 'real world' terms if your L1 cache was your desk and took 1 second to access your main memory (your filing cabinet say) would take one minute for each item you had to find Alan The same things are true for pretty much all modern processors. The newer the processor quite often the larger the gap because the processor is getting faster more rapidly than the memory Alan Worse still there are physical limitations on how fast the memory can go, and how quickly signals can travel acrosss the motherboard - the speed of light really is too slow nowdays Alan The obvious question then is what is in the cache. If we know what is in our cache and what data it will keep we have some idea how we want to write our programs Alan d33p] can the cache be directly manipulated by a programmer.. I Alan would have thought it wasnt? Alan d33p: in the normal cases you can't directly control the cache.. but you can understand how the cache will behave Alan d33p: there are instructions on newer processors where you can help the cache along - but thats the last slide of the talk 8) Alan The cache holds the most recently used code and data. So if you execute a loop the loop will end up in the cache Alan> similarly if you are looking at a list regularly the list contents will end up in your cache Alan Because the cache is quite primitive in some ways the actual data it can store in each piece of the cache (each cache line) is quite restricted. Alan The processor doesnt have time to look at all of the cache to see if a piece of data is already in the cache. Instead it breaks the address up into several pieces Alan The upper bits of the address (address & ~4095) go to the memory management hardware to turn a virtual address into a physical addresss Alan At the same time the lower bits are passed to the cache. The cache looks at the remaining bits (ignoring the lowest 4-6 depending on the processor) Alan and it looks in two or four places to see if the data it needs is present. if it is then it uses this data, and cancells the work the memory management hardware is doing Alan That limitation has some fun effects which I'll demonstrate later on Alan Not all of memory is cached of course - it would be a bad idea if data was cached that was for your display and you didnt see the text because the processor had it Alan init64] Alan_Q : some cache have another way to find lines. They Alan use 1 "comparator" per cache line Alan init64] but I guess it's too expensive for big amounts of cache Alan init64: basically yes - the more complex an algorithm the longer it takes to run - even in hardware Alan With a 3GHz processor you don't have very long to decide if something is in cache or not Alan That is also one reason it is common to have a small fast L1 cache, and a larger slower (but smarter) L2 cache Alan The processors normally deal with memory in chunks of 16, 32 or 64 bytes. Alan So each piece of cache holds chunks of those sizes and aligned to that size. The chunks get bigger as the L1 caches get bigger generally Alan All of that is loaded at the same time. When you ask an Athlon for a byte of data it will load the entire 64 bytes it wants. Alan This means several things - one of which is that if you are going to use data, put the data you use together Alan The kernel goes to great pains to put structures in an order where data that is used together is close together Alan because if you loaded one bit of that data you have the rest anyway Alan Ok now a first demonstration of why knowing about caches matter is demo1 (http://www.linux.org.uk/~alan/Slides/slide5.html) Alan This is a very simple program that writes 4096 values into memory Alan (we run it lots of times to get some numbers) Alan We run this with the data spaced out on 1,2,4,8,16,32 and 64 byte boundaries Alan just like updating an array of different sized structures Alan tarzeau] how do you run demo1 in single user mode? Alan tarzeau: it'll give you reasonably reliable answers if you just run it without too much else going on Alan even multiuser Alan So what demo1 does is much like a large number of perfectly normal applications. Alan You'll notice that evenon a Pentium IV with very good memory and large caches then performance drops considerably Alan ifvoid] how large are the P4 L1 and L2 cache? Alan That varies depending upon whether it is a Xeon or not Alan Those numbers are from a Xeon with 512K of L2 cache and I think 64K of L1 cache Alan If you run the program on something like a Celeron then you would see a much more rapid reduction in performance Alan you'd probably also want to change the for loop to do 100000 not 1000000 or you'll be waiting for it all night Alan You can actually use techniques like this to find the properties of the cache on a processor Alan We don't do that in Linux because the kernel knows how to ask the processor properly for the data (and puts much of it into /proc/cpuinfo) Alan What this tells us is that if you are going to scan large blocks of data, you want the values you are scanning to be together in memory Alan If we do 4096 comparisons of values close to each other we could be several times faster than if we looked at one field in each element of an array Alan That demonstrates how important careful planning is. Alan Of course in many cases you can use trees, hashes or other much more intelligent data structures to achieve the same results or better Alan The second demonstration is designed to show something else Alan On older processors it was very common to use lookup tables for things like division in 3D games. With a modern processor this isnt always so clear Alan Demo2 finds out how long it takes to do a lot of divisions, then compares it with using a lookup table to do the same thing Alan On the pentium4 once the lookup table exceeds 128K the performance is actually better by doing the maths - even though divide is an extremely costly operation Alan This is because looking data up in main memory is actually more expensive than doing division Alan (again on a slower box you may well want to make the loop somewhat smaller) Alan [14-Dec:18:37 smesjz] Step of 1 across 4K took 50 seconds. (p133/2.5.49) :) Alan smesjz: Gives you an idea how much processor performance has changed Alan The division one is quite interesting because it depends heavily on the processor Alan on something like a pentium the lookup table will be way way cheaper Alan but by the time you reach the athlons and PIII/PIV it becomes a lot less clear Alan smesjz] Alan_Q: it surprises me that 5 out of demo1 tests Alan returns 50 seconds runtime (100k iterations) Alan smesjz: To find out why you'd really have to look at what was going on more deeply - I don't know either, unless your memory is the real limit Alan [14-Dec:18:41 rp] can someone please explain exact meaning of 1024 tablesize? Alan rp: the different runs access data from lookup tables that are 64K, 128K and so on Alan so 1024K lookup table means the program is simulating random accesses to 1Mbyte of lookup data Alan if you run this with bigger and bigger sizes eventually the performance becomes about constant. That gives you a good idea that the cache is no longer helping out Alan This paticular test is very important for things like image processing. jpeg compression and the like Alan one of the reasons that MMX is such a help for video processing is that it lets you do a lot of processing at the same time, so you can avoid lookup tables when doing things like colour conversion Alan ridiculum] is there in linux any tool like Vtune (intel) to Alan debug thinks like alan is explain? Alan ridiculum: There are two - there is an open source thing called "oprofile", and there is Intel vtune which is expensive and requires a second windows PC and other things Alan ridiculum: you can look at a lot of the statistics because the newer processors have debugging registers Alan they allow tools like oprofile to ask the processor "how many cache misses", "how often did you have to wait for data" Alan and other similar questions. Alan http://oprofile.sourceforge.net/ is the OProfile profiler Alan So we've got some simple demonstrations of how important the cache is Alan [14-Dec:18:48 avoozl] valgrind also might be interesting to look at, the Alan latest version also can do cache simulation and show where in a Alan program cache misses are occuring Alan avooz: yes I had forgotten valgrind can do cache simulation too Alan were there cpu's without cache? intels 80286? Alan tarzeau: there were a lot. The 286 was almost never faster than the RAM it was attached too - similarly on the Amiga the RAM is almost twice as fast as the processor Alan The amiga actually used that trick to give the processor and support chips shared accesss Alan processor and chipset having alternate access Alan So what have we learned about the cache and making good use of it Alan Well - we know that when we get data we get it in chunks so we can put things we use in the same place. Alan That helps the processor and also bappens to help virtual memory (when you swap data to disk you do so in 4K chunks so you may as well keep data together for that too) Alan We've demonstrated that you want to keep your processing fitting within the cache. One reason Intel sell expensive processors with very large caches is that databases find it hard to do this Alan so the Xeons and the really expensive pentium-pro with 1Mb caches were good for database work Alan We also know that only a certain amount of data at a given alignment can be cached Alan The kernel actually uses special memory allocators to try and scatter objects the kernel uses onto different alignments specifically because of this Alan tarzeau] those edo memory sticks were 60ns and 70ns, how ns are Alan l1 and l2 caches? Alan tarz: for modern processors Im not actually sure - even on the 486 L2 cache was about 16nS Alan If you have an array of objects that are power of two sized you are likely to be getting almost worst possible performance from the caches Alan so its a useful trick to add a little extra unused space to each block of memroy to pad out the array elements so they cache better Alan Ok time for the next demonstration Alan http://www.linux.org.uk/~alan/Slides/slide7.html Alan (there isnt a demo3.. I took it out to make the talk ift time better 8)) Alan This is designed to show how different ways of doing something can have different performance because of the caches Alan what it actually does (adding numbers) is fairly trivial, but its not that unlike real programming examples Alan The first run generates a large set of data, and then adds it up. Generating sets of data then processing them, then processing the results is a very common way of programming Alan but it can actually give the worst possible behaviour Alan The second run we add the data up as we generate it, and get much better performance. Alan This is mostly because the first run we end up emptying all the data out of the cache and then loading it back in again Alan In the second case because we add as we go the data only ever leaves the cache once Alan The final case shows how much of the operation is the actual overhead Alan What this means is that for any large amount of data and computation it is important to work on it in chunks. Alan Engineers and high performance computing people do this all the time - the GIMP knows about it too Alan Many things the GIMP does in its filters it does using rectangles of the image rather than applying each change to the entire image one after another Alan debUgo-] Alan_Q: how much affects cache associativeness in Alan general memory performance? Alan debugo: keeping data in cache makes a real difference to overall performance - mostly on SMP systems, which is where the next few slides go Alan There are lots of algorithms for this and the same techniques are actually uses for clusters and beowulfs - only they are trying to minimise messages over ethernet so its much much more important than on a single system Alan All of this stuff about caching matters much much more when you have a multiprocessor PC Alan Less so on the bigger alpha and sparc machines because they have memory systems designed for multiple processors Alan A dual athlon or dual pentium III/IV however is two processors on the same memory bus Alan The 3 demonstrations have already shown that with a single processor the memory performance is not up to the processor Alan So a dual processor machine gives us twice the problem Alan One of the demonstrations you can do is to run a continuous large memory copy on one processor and time performance of copis on the other - on some dual PC machines the copies being timed will perform at 1/3rd of the speed they run without the other copying loop Alan ifvoid] Alan_Q: won't that change for the Hammer and Itanium 3? Alan ifvoid: hammer lets you attach memory to each processor, the more processors you add the more memory controllers you can add Alan ifvoid: it depends what the cost of that is whether vendors will do it Alan rp] so does that mean memory performance does not depend only on Alan processor but also on bus speed? Alan rp: yes Alan unlevel0] <Alan>So a dual processor machine gives us twice the Alan problem :so this explains why we do not get 2x the performance of 1 Alan porcessor Alan runlevel0: there are two reasons you don't get twice the performance Alan the first is that you are sharing a memory which is not fast enough Alan the second is that there is a cost in stopping the system from doing the wrong two things at the same time Alan the kernel has to do real work to stop two people allocating the same memory, using the same disk block and all the other things we dont wish to happen Alan The only reason a dual processor PC is usable at all is because most memory accesses are coming from the cache in normal usage Alan each processor has its own cache (except some dual pentium machines which are just painful 8)) Alan ridiculum] what about hyperthreading and cache coherence? Alan ridiculum: hyperthreading shares the cache between the two execution units on that processor Alan so you get to do two things at once but each application will suffer more cache misses Alan sh0nX] so this is where spinlocks come in Alan sh0nX: right - thats the main thing the kernel uses to synchronize things internally Alan One thing the processors have to do is to ensure that the two processors dont cache different versions of the same data or miss changes the other processors make Alan docelic] Id appreciate more on spinlocks too Alan doc: we'll talk about that a bit after the main talk Alan bzzz] Alan_Q: how pci devices may see data which in cache only? Alan bzzz: the processors as well as making sure they see each others changes do the same with devices on the PCI bus Alan bzzz: The standard caching technique is a thing called MESI Alan That stands for the four states each piece of the cache can be in Alan We have an "M" - or modified state. That means this piece of information is something this processor has changed and that we have data the other processors dont know about yet Alan We have an "E - Exclusive state - where we know nobody else has this data but we do Alan We have an "S" or shared state, where we know we have a copy of the data but other people also have copies in shared state Alan nobody has it modified Alan and we have "I" or Invalid - where we don't have a clue what is going on but we know we don't have the data Alan At any point two processors cannot have the same data except in shared state Alan When we modify some data we change the state on it - if it ws exclusive it becomes modified Alan If it was shared we have to kick each of the other processors and make them get rid of their copies Alan If we dont have a copy (I) we must ask for it - this like moving from shared can be quite expensive Alan If another processor had a copy in modified state we have to ask that processor to write it back to memory and then read it ourselves Alan What we want to avoid at all costs is having two processors continually modifying the same data Alan This turns into a sort of food fight on the memory bus Alan and we spend most of our time passing data back and forth between the processors Alan That doesn't get a lot of work done - and once you want to scale to big computers it becomes very important indeed to avoid it Alan IBM have been doing a lot of work on kernel code where these kind of fights can occur as they have 16 processor systems Alan which make it very apparent when you get this wrong Alan [14-Dec:19:17 sh0nX] Alan_Q: so we should be using the cache for SMP Alan processors to keep data that isnt going to change much and use the Alan processors to handle data that does change often? Alan shonX: there are systems where it makes sense to have heavily shared data uncached. The way the PC hardware works really stops you doing this Alan Even if you make it uncached it is still slow Alan Most of the time this is not a problem - applications dont share a lot of things anyway Alan Threaded applications tend to share very little data thankfully Alan When you design threaded an SMP applications it is important to minimise the amount of time data spends bouncing between processors Alan So for example if you were doing JPEG encoding on a multiprocessor system it would be better to use one processor to do the top half of the image and the second processor to do the bottom half Alan than to have one processor do colour conversion and the other processor do the compression pass Alan when it comes to things like mpeg encoding this gets quite tricky Alan In addition it is possible to get what is called "false sharing" Alan h0nX] Alan_Q: so we want to keep both processors doing OTHER Alan things Alan shonx: exactly Alan shonx: like people processors work best when they are not falling over each other Alan False sharing occurs because the processor cache works in 32 or 64 byte chunks Alan If you happen to put two unrelated pieces of data in the same 64 bytes you might accidentally have one thing used by each processor in the same cache line - and start a fight Alan Thus people pad out such structures to make them bigger and avoid this Alan or they keep them apart Alan (padding them out avoid sharing but it means you use more cache of course - so you are doing what demo1 said not to do) Alan sh0nX] Alan_Q: so when designing SMP applications, how do we Alan tell which processor to handle which data without causing the Alan processors to both handle the same data? Alan shonx: the scheduler tries to keep a given thread running on the same processor as much as possible Alan so its just a matter of avoiding accessign the same data a lot in two different threads Alan Similarly we try and keep a given application running on the same processor so that we dont spend a lot of time copying stuff from one processor to another Alan [14-Dec:19:25 yalu] Alan_Q: is the scheduler smart enough to keep threads who Alan share a lot of data on the same processo Alan yalu: it makes some simple guesses Alan but its actually very hard to measure the real amount of shsring efficiently Alan espcially since read only sharing (eg of code) is fine Alan zwane] Alan_Q: All this must get really interesting with Alan Hyperthreaded cpus Alan zwane: There are reasons Ingo is still fiddling with getting the best performance off such processors 8) Alan With hyperthreading you sort of have two processors per cache Alan and the cache has some other odd internal limits too Alan zwane] Alan_Q: do you reckon scheduler only would suffice? How Alan about leveraging cpu affinity for say doing bias in interrupt Alan handling? Alan zwane: There are good arguments in some cases for having a process wake up on the CPU that handled an interrupt. In most cases however it isnt anything like as valuable as you would think Alan most of the process data is cached on the cpu that last ran it Alan Most good I/O devices used DMA - so they wrote to memory themselves and the memory they wrote to they have removed from all the processor caches (since they modified it) Alan there are good reasons for sticking interrupts ot specific processors Alan (if processor 1 has all the data for eth0 cached then why handle the interrupt on processor 2) Alan sh0nX] so, if a program is written for UP, how does the kernel Alan scheduler handle its data on two CPUs? or it can't Alan sh0nX: the scheduler can't split up something with only one thresd of execution. It can spread different applications around - so it can run your game on one processor and the X server on the other Alan sarnold] Alan_Q: does linux currently have a mechanism to Alan specify that all interrupts should be handled by a specific [set of] Alan CPUs? Alan sarnold: it has some stuff that Ingo did, its at the obscure and wonderous end of kernel tuning Alan sh0nX] I see, so we have to use threads in our code in order to Alan benifit SMP Alan sh0Nx: or two programs sometimes is just as easy or easier Alan There is one last subject for this talk, then we can move onto most of the questions Alan Someone asked early on about helping the cache out Alan On a modern processor you have instructions like "prefetch" and "prefetchw" Alan These allow you to tell the processor you will be needing data in the future Alan So instead of getting stuck waiting for data to arrive from memory you can tell the processor in advance Alan The big problem with this is you often don't know well in advance which memory you will need Alan A memory copy is easy - and the Athlon memory copy in Linux actually keeps saying "and prefetch me 320 bytes ahead of this point" Alan Similarly things like graphics processing benefit immensely as do programs that use large arrays of data in predictable fashions Alan (fortran does very well here strangely enough Alan We use this in the krnel for memory copies and some times for lists Alan it is hard to use for lists because memory is so slow you want to say "prefetch me about five or six items ahead" Alan <translator wait> [prefetch me a translator ;)] * riel dcc's the crowd some virtual beers Alan Ok translators fingers seem to have caught up Alan What we actually need to make this sorto fthing work is new data structures Alan one of the common approaches is to have lists which know next/previous but also know 'five items on' and 'five items back'. We don't do this in the kernel currently Alan but it may be something we must look at in the future as processors get faster still Alan The final useful thing prefetch is used for in the kernel makes use of the Athlon 'prefetchw' which says "I want this data soon, and I will write to it" Alan unlike prefetch this gets an exclusive copy of the data. We use this for prefetching locks - which is something that is very expensive if it has to go to main memory Alan It is very common for a lock structure to belong to another processor and we often know the lock is going to be used so can prefetch it Alan sh0nX] I assume we use some sort of spinlock to prevent another Alan processor from prefetching the same data? Alan2 uggh.. lag 8( Heimy mmh... Heimy 19:44 <Alan> sh0nX] I assume we use some sort of spinlock to prevent another Heimy 19:44 <Alan> processor from prefetching the same data? Alan2 We don't actually lock that Alan2 very occasionally we prefetch it and it is stolen by another cpu then fetched back again Alan2 it happens so rarely it is cheaper not to worry Alan Ah .. back again Alan2 or not as they case may be Alan2 Also if you had a lock for the lock - you would want to prefetch for the prefetch Alan2 and so on repeatedly Alan2 So in the kernel we treat prefetch very much as a hint Alan2 if it does the right hting most times then it is fine.. Alan2 Ok that is really the end of the main part of the talk Alan2 hopefully it has given people some ideas of why caches matter Alan2 and a bit about programming with them in mind Alan2 If we can start with on topic questions before we wander off that would be best riel I guess people should ask the on topic questions in #qc riel so we can leave #linux moderated for a few more minutes Alan2 sarnold:#qc] Alan2: i've wondered if prefetching cuts memory Alan2 bandwidth significantly.. have people tested with prefetch config'ed Alan2 away? Alan2 sarnold: we've done a fair amount of testing. Most of the time prefetching actually helps use memory bandwidth that would otherwise be wasted Alan2 The athlon one was so fine tuned that we broke some VIA chipsets due to a hardware bug though 8) Alan rene:#qc] Alan2: talk seemd to be about cacheing alone. do Alan things like instruction alignment make a lot of difference om modern Alan processors? Alan rene: they matter a bit - it depends on the processor how much. gcc does know how to get these right when you pick a processor type. Normally however it is under 1% Alan Arador:#qc] Alan2: what're the effects of preempt on caching? Alan arador: the more you switch bewtween tasks the less useful the fache gets Alan s/fache/cache Alan Pre-empt doesn't really make a lot of difference Alan It is however why systems designed for a lot of simultaneous users have a lot of cache Alan [14-Dec:19:55 aka_mc2:#qc] ALAN: do you think that Crusoe processor, Linux Alan supported, it will be considered for all these programmation techs?? Alan aka_mc2: Crusoe is very hard to deal with - the system emulates an x86 and it adjusts its emulation according to things it learns at runtime. That means it can learn what seems to need prefetching and many other things a normal processor cannot. How much of that it actually does I don't know. Alan sklav:#qc] i have noticed higher load averages after i use a Alan kernel with -03 and or -05 Alan sklav: much of that is actually cache related - gcc -O3 and -O5 unrolls loops which makes them use a lot more memory and on modern cpus is a bad thing to do Alan really it is a bug in some gcc's that it does this too much Alan jmgv:#qc] Alan? dont you think a lot of the work about registers Alan users and other questions depend of the compiler and that made us Alan lose some control about those issues? Alan2 jmgv: true - but do you want to hand optimise one megabyte of code ? Alan2 jmgv: for the krnel we actually write small critical pieces of code in assembler in som cases - things like memcpy for example Alan2 there are other bits where the C is written so that the compiler outputs the right code rather than the obvious way Alan2 Rapiere: If GCC improves one Alan2 thread cache use, won't this spoils multi-threading interactivity ? Alan2 Rapiere: the scheduler is dealing at a much higher level - and the decisions it makes which are designed for best cache performance are the right ones anyway fortunately Alan2 sapan:#qc] Alan2: you said "we know that only memory of certain Alan2 sizes at certain offsets can be cached" could you explain? Alan2 The processor uses parts of the address to indicate which bit of the cache to look in Alan2 To the CPU an address really looks like [Page Number][Cache line][index into cache] Alan2 So the cache always caches on a 64 byte boundary on an athlon Alan2 In addition if we have lots of date with the same cache line number we can only cache two or four of those bits of the data Alan2 the cache can't store any block of data in any place Alan2 Ok shall we go onto more general questions for a bit (Rik when is the next talk scheduled ?) riel Next talk will be tomorrow at 1800 UTC Alan2 coywolf!jack@210.83.202.168* what do you think windows GUI is Alan2 far faster than linux GUI? Alan2 coywolf: because they didnt attend my lecture 8) Alan2 coywofl: but you should go try xfce/rox even on a 32Mb PC 8) Alan2 sh0nX:#qc] since we're offtopic now: Alan2: Do you have patch Alan2 for the amd76x_pm module for 2.5.xx? Alan2 shonx: it shouldnt be very hard to port but I dont think anyone has ported it yet Alan2 shonx: cool Alan2 (sh0nX:#qc] Alan2: im trying to port it right now) Alan2 sapan:#qc] Alan2: I have an iPAQ with familiar running Alan2 2.4.18-rmk - if I were to optimize things in the kernel/apps in Alan2 general, what should I be looking at? Alan2 sapan: Im actually not that familar with the ARM internals. The same general things should apply Alan2 sapan: obviously there are other considerations on a handheld too - lack of a disk, power saving etc Alan2 E0x:#qc] Alan2 what is you prefer procesor ? Alan2 EOx: this varies. I love the raw speed of the Athlon but hate the reliability and the heat problems Alan2 At the moment I am playing with VIA C3/VIA Eden processors - which are quite slow but are designed to be very power efficient - no fan needed Alan2 this makes for very quiet and cheap systems Alan2 plus small boards people can do crazy things with - like put them into old sparc boxes, or even a gas can Alan2 (www.mini-itx.com) Alan2 ridiculum] what's your opinion about itanium2? it's better than Alan2 hammer? Alan2 ridiculm: right now I am better firmly on the hammer Alan As to why the athlon reliability is a problem Im not sure - I've had real problems with getting reliable memory on the dual athlon, heat problems and a lot of hardware incompatibility Alan but it does go awfully fast once it works Alan apuigsech:#qc] Alan, on GDT table we can find some nul Alan decriptors (not used), ?is that to gain optimization on cache memory Alan usage? Alan apui: actually several of those gaps are because we used to use them for things and wanted to keep some data the same, others have fixed values required by standards, or for windows bug compatibility in the bios - so not the cache this time Alan rene] (so that CPUS don't trample on ech others cache lines) Alan rene: we have to space some things out for that Alan rene: One example is that the kernel has a structure that describes each page of memory Heimy rene] (de manera que CPU no machaquen las líneas de caché de otros) Alan Various people went to great pains to make that structure exactly 64 bytes ona PC Heimy ooops Alan sh0nX] Alan: do you visit #kernelnewbies? :) Alan shonx: not oten enough - its a really important project jmgv <davej> folks interested in the prefetching stuff Alan talked about may find the presentation at http://208.15.46.63/events/gdc2002.htm interesting Alan sh0nX] Alan: do you visit #kernelnewbies? :) Alan shonx: not oten enough - its a really important project riel ok, the questions seem to be slowing down riel I guess it's time to wrap up the "official" part of this talk riel I'd like to thank Alan for this interesting talk riel and I'd like to remind everybody else of the other UMEET lectures we'll still have jmgv we thanks alan cox his effots riel you can see the full program here http://umeet.uninet.edu/umeet2002/english/prog.eng.html MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap riel clap clap clap clap clap clap clap clap riel clap clap clap clap clap clap clap clap Ston clap clap clap clap clap clap clap clap Ston clap clap clap clap clap clap clap clap jmgv clap clap clap clap clap clap rp clap clap clap jmgv clap clap clap clap clap clap sh0nX clap clap clap clap clap clap jmgv clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas Ston clap clap clap clap clap clap clap clap sh0nX clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas angelLuis plas plas plas plas plas plas plas plas plas sh0nX clap clap clap clap clap clap jmgv clap clap clap clap clap clap mulix clap clap clap clap clap clap clap clap mips hahaha angelLuis plas plas plas plas plas plas plas plas plas angelLuis plas plas plas plas plas plas plas plas plas angelLuis plas plas plas plas plas plas plas plas plas Ston clap clap clap clap clap clap clap clap mulix clap clap clap clap clap clap clap clap mulix clap clap clap clap clap clap clap clap rp clap clap clap casanegra clap clap clap varocho clap clap clap bit0 clap clap clap clap apuigsech x) angelLuis plas plas plas plas plas plas plas plas plas Ston clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas Ston clap clap clap clap clap clap clap clap NiX clap clap clas clap clap Ston clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap sh0nX clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas jacobo clap ms clap clap MJesus clap clap clap clap clap clap clap clap clap clap jacobo clacp sarnold clap clap clap clap clap :)) sarnold clap clap clap clap clap :)) rp great one angelLuis plas plas plas plas plas plas plas plas plas sarnold clap clap clap clap clap :)) bit0 plas plas plas sarnold clap clap clap clap clap :)) sarnold clap clap clap clap clap :)) jeffpc clap clap clap clap clap clap clap clap clap clap clap HPotter plas plas plas MJesus clap clap clap clap clap clap clap clap clap clap jeffpc clap clap clap clap clap clap clap clap clap clap clap jeffpc clap clap clap clap clap clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas jeffpc clap clap clap clap clap clap clap clap clap clap clap casanegra clap clap clap mulix *-* *-* *-* *-* *-* *-* mulix *-* *-* *-* *-* *-* *-* angelLuis plas plas plas plas plas plas plas plas plas Ston clap clap clap clap clap clap clap clap mulix *-* *-* *-* *-* *-* *-* _Josh_ alan rules!!! Ston clap clap clap clap clap clap clap clap Geryon great talk :) Ston clap clap clap clap clap clap clap clap Karina clap clap clap y mas clap :) MJesus clap clap clap clap clap clap clap clap clap clap Chico plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas NiX plap plap plap plap plap plap plap plap plap plap plap plap plap plap plap plap angelLuis torero! bravo!!!! error27 clap clap clap Baldor clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap angelLuis torero! bravo!!!! sh0nX clap clap clap clap clap clap (2 more times) drizzd clap clap clap clap clap clap clap clap clap sh0nX clap clap clap clap clap clap (2 more times) Chico plas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plas sh0nX clap clap clap clap clap clap (2 more times) BorZung plas plas plas plas plas plas plas plas plas _Yep_ thanks mcp oh what braindead people angelLuis plas plas plas plas plas plas plas plas plas ibid clap clap clap angelLuis plas plas plas plas plas plas plas plas plas BorZung plas plas plas plas plas plas plas plas plas casanegra clap clap clap BorZung plas plas plas plas plas plas plas plas plas sapan clap clap EleTROn VIVA Alan ! BorZung plas plas plas plas plas plas plas plas plas BorZung plas plas plas plas plas plas plas plas plas BorZung plas plas plas plas plas plas plas plas plas BorZung plas plas plas plas plas plas plas plas plas NiX félicitations! angelLuis :)) MJesus clap clap clap clap clap clap clap clap clap clap Baldor clap clap clap clap clap clap clap Chico Bien MJesus clap clap clap clap clap clap clap clap clap clap angelLuis plas plas plas plas plas plas plas plas plas pask docelic JUAS JUAS JUAS EleTROn Alan no Forum Internacional de Software Livre no Brasil 2003 Heimy clap clap clap clap clap clap clap clap clap clap EleTROn VIVA Heimy (sorry, I was translating) :-) mcp sarnold: hehe coywolf sarnold ... if only alan hadn't had lag problems... i guess NTL hasn't fixed all his problems. :( sh0nX hehe Ston errr rp is alann coming back MJesus ¡¡Viva Alan!! rp s/alann/Alan Ston donde esta el ? sh0nX hey now casanegra nu ce :S sarnold I'd like to mention that Milton's Cisco presentation has been replaced by james morris; he will be presenting on the new 2.5 kernel cryptography support angelLuis se ha perdido los aplausos??? raciel good talk Alan! mips EleTROn: cheguei e o cara terminou de falar mips hahahaha sh0nX :-) mips lixo EleTROn <mips> hauhauaha mips só vi a msg agora. riel MJesus: fast action mips q o barbanegra me mando sarnold MJesus: nice :) sh0nX I'd like to thank the UMEET people for getting Alan to speak today :-) EleTROn <mips> tava tri angelLuis MJesus: very good!! sh0nX it was very informative, and I learned a lot more about SMP :) rp clap for UMEET rp clap for UMEET rp clap for UMEET Chico very nice, Mª Jesus mips EleTROn: não vou morrer por isso =) não morro de amores por esses locos EleTROn mips: nem eu :) angelLuis hurra for UniNet.edu!!!!!! angelLuis hurra for UniNet.edu!!!!!! angelLuis hurra for UniNet.edu!!!!!! * riel knows the netmask of the real alan sh0nX heh Ston riel: where is Alan ? angelLuis riel: :)) sh0nX riel: i think it was visible before * rp does not know netmask of real alan mulix imitation is the sincerest form of flattery sarnold Ston: probably ping timeout :( riel Ston: at home, probably eating something now sh0nX but im not going to mention it sarnold mulix: except in the case of coywolf :-/ jacobo mulix: it depends on the quality of the imitation ;) Ston jejeje ok =) freddy Hay alguien de MÃ©xico aquÃ? riel he must be hungry after two hours of presentation Megatron yo jacobo bye rp how long was the *full* presentation freddy NNo eres por casualidad David Limon? debUgo- talking makes him thirsty? juan has the conference finished? sarnold rp: about 2.25 hours debUgo- X) mips EleTROn: que é? bit0 juan: yes riel debUgo-: dunno about Alan, but it usually works for me ;) sarnold juan: alan's presentation is over, but there is still one more week of uninet presentations. :) debUgo- heheh sh0nX :) *** Zeno (fltak@zeno.student.utwente.nl) Quit (Lost terminal) Ston promedio de personas en el canal durante la charla era de 260 personas jejeje xD Ston numero tope que vi 280 xD debUgo- riel: at least that you speak as you type (too) heh Megatron freddy sip jmgv really good Heimy Well. debUgo- that would be funny Heimy I dunno if he's thirsty MJesus and in #redes more than 100 aditional peoples Heimy But his wrists should be on pain right now :P mips huh Ston MJesus: 123 ;-) rp who got Alan to give this talk? Ston uh 132 :) riel Heimy: that happened to me, after my presentation drizzd Heimy: you can tell, hmm? Heimy :-)) riel Heimy: I just had to go away from the keyboard for a while ;) Heimy drizzd: Me? Why? sh0nX riel ;-) MJesus for traslartor: drizzd Heimy: because you had to type as much as did MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap Heimy I only translated half of his presentation :-) drizzd s/as/he pask its enough? jmgv rp: umeet got Alan. at umeet dont exist individuals, umeet is a group MJesus traslator to Spanish: arador, jacobo and heimy (with vizard) pask clap clop clup MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap MJesus clap clap clap clap clap clap clap clap clap clap Generated by irclog2html.pl 2.1 by Jeff Waugh - find it at freshmeat.net!