*** riel sets mode: +m
* riel sets the channel to moderated
rielI guess it's better if we let Alan have a few minutes of rest before the talk
rielOK, welcome to this UMEET lecture
rieltoday Alan Cox will hold a talk titled
riel   "Optimising for modern processors"
rielin order to keep this channel readable, we have set it +m
rielso only the ops can talk
rielif you have a question during the lecture, you can always ask it in #qc
rielyou probably already know Alan Cox, who is one of the driving forces behind Linux kernel development
rielhe is a man of many talents though
rielin fact, he even prepared slides for this talk
rielyou can find those on:
rielhttp://www.linux.org.uk/~alan/Slides/
rielAlan, go ahead when you're ready
AlanOk
AlanThis talk is partly about how modern processors work
AlanMostly however its about why this changes the way you need to program to get the best performance
AlanBy modern processors we really mean anything from the pentium onwards - in some ways from the 486 onwards
AlanTen years ago a 40MHz processor was pretty fast. Today the same position is occupied by a 3GHz processor
AlanMemory has not increased speed to cope with this, and more importantly it has not improved in latency (the time from asking for a piece of memory to getting it) much at all
AlanThe new processors also can execute multiple instructions each clock cycle, so in fact the processor might want to be accessing memory not 100 times faster as you might think from the clock rate change but near 500 times faster
AlanTo deal with this processors added cache memory. It is possible to build systems which just have very fast memory but its incredibly expensive
AlanThe sort of computer you find on your desktop today has a vry slow memory subsystem - things like 133MHz SDRAM and DDR ram have improved the data rate but not enough, and have done little to improve the access time for a given piece of data
AlanTo give you an idea how slow main memory is compared with the cache I measured the copying speed of data in the on processor cache (called the L1 or level 1 cache) and the larger slower cache (The L2 or level 2 cache)
AlanOn an Athlon the L2 cache was six times slower for copying than the L1 cache
AlanMain memory is eight times slower than the L2 cache
AlanSo every piece of data you have to fetch from main memory you could have fetched fifty from the cache
AlanThis makes keeping the right things in the cache extremely important, as well as knowing how the cache works so that you can understand what is needed to get the best use from it
AlanIn 'real world' terms if your L1 cache was your desk and took 1 second to access your main memory (your filing cabinet say) would take one minute for each item you had to find
AlanThe same things are true for pretty much all modern processors. The newer the processor quite often the larger the gap because the processor is getting faster more rapidly than the memory
AlanWorse still there are physical limitations on how fast the memory can go, and how quickly signals can travel acrosss the motherboard - the speed of light really is too slow nowdays
AlanThe obvious question then is what is in the cache. If we know what is in our cache and what data it will keep we have some idea how we want to write our programs
Aland33p] can the cache be directly manipulated by a programmer.. I
Alan         would have thought it wasnt?
Aland33p: in the normal cases you can't directly control the cache.. but you can understand how the cache will behave
Aland33p: there are instructions on newer processors where you can help the cache along - but thats the last slide of the talk 8)
AlanThe cache holds the most recently used code and data. So if you execute a loop the loop will end up in the cache
Alan> similarly if you are looking at a list regularly the list contents will end up in your cache
AlanBecause the cache is quite primitive in some ways the actual data it can store in each piece of the cache (each cache line) is quite restricted.
AlanThe processor doesnt have time to look at all of the cache to see if a piece of data is already in the cache. Instead it breaks the address up into several pieces
AlanThe upper bits of the address (address & ~4095) go to the memory management hardware to turn a virtual address into a physical addresss
AlanAt the same time the lower bits are passed to the cache. The cache looks at the remaining bits (ignoring the lowest 4-6 depending on the processor)
Alanand it looks in two or four places to see if the data it needs is present. if it is then it uses this data, and cancells the work the memory management hardware is doing
AlanThat limitation has some fun effects which I'll demonstrate later on
AlanNot all of memory is cached of course - it would be a bad idea if data was cached that was for your display and you didnt see the text because the processor had it
Alaninit64] Alan_Q : some cache have another way to find lines. They
Alan         use 1 "comparator" per cache line
Alaninit64] but I guess it's too expensive for big amounts of cache
Alaninit64: basically yes - the more complex an algorithm the longer it takes to run - even in hardware
AlanWith a 3GHz processor you don't have very long to decide if something is in cache or not
AlanThat is also one reason it is common to have a small fast L1 cache, and a larger slower (but smarter) L2 cache
AlanThe processors normally deal with memory in chunks of 16, 32 or 64 bytes.
AlanSo each piece of cache holds chunks of those sizes and aligned to that size. The chunks get bigger as the L1 caches get bigger generally
AlanAll of that is loaded at the same time. When you ask an Athlon for a byte of data it will load the entire 64 bytes it wants.
AlanThis means several things - one of which is that if you are going to use data, put the data you use together
AlanThe kernel goes to great pains to put structures in an order where data that is used together is close together
Alanbecause if you loaded one bit of that data you have the rest anyway
AlanOk now a first demonstration of why knowing about caches matter is demo1 (http://www.linux.org.uk/~alan/Slides/slide5.html)
AlanThis is a very simple program that writes 4096 values into memory
Alan(we run it lots of times to get some numbers)
AlanWe run this with the data spaced out on 1,2,4,8,16,32 and 64 byte boundaries
Alanjust like updating an array of different sized structures
Alantarzeau] how do you run demo1 in single user mode?
Alantarzeau: it'll give you reasonably reliable answers if you just run it without too much else going on
Alaneven multiuser
AlanSo what demo1 does is much like a large number of perfectly normal applications.
AlanYou'll notice that evenon a Pentium IV with very good memory and large caches then performance drops considerably
Alanifvoid] how large are the P4 L1 and L2 cache?
AlanThat varies depending upon whether it is a Xeon or not
AlanThose numbers are from a Xeon with 512K of L2 cache and I think 64K of L1 cache
AlanIf you run the program on something like a Celeron then you would see a much more rapid reduction in performance
Alanyou'd probably also want to change the for loop to do 100000 not 1000000 or you'll be waiting for it all night
AlanYou can actually use techniques like this to find the properties of the cache on a processor
AlanWe don't do that in Linux because the kernel knows how to ask the processor properly for the data (and puts much of it into /proc/cpuinfo)
AlanWhat this tells us is that if you are going to scan large blocks of data, you want the values you are scanning to be together in memory
AlanIf we do 4096 comparisons of values close to each other we could be several times faster than if we looked at one field in each element of an array
AlanThat demonstrates how important careful planning is.
AlanOf course in many cases you can use trees, hashes or other much more intelligent data structures to achieve the same results or better
AlanThe second demonstration is designed to show something else
AlanOn older processors it was very common to use lookup tables for things like division in 3D games. With a modern processor this isnt always so clear
AlanDemo2 finds out how long it takes to do a lot of divisions, then compares it with using a lookup table to do the same thing
AlanOn the pentium4 once the lookup table exceeds 128K the performance is actually better by doing the maths - even though divide is an extremely costly operation
AlanThis is because looking data up in main memory is actually more expensive than doing division
Alan(again on a slower box you may well want to make the loop somewhat smaller)
Alan[14-Dec:18:37 smesjz] Step of 1 across 4K took 50 seconds. (p133/2.5.49) :)
Alansmesjz: Gives you an idea how much processor performance has changed
AlanThe division one is quite interesting because it depends heavily on the processor
Alanon something like a pentium the lookup table will be way way cheaper
Alanbut by the time you reach the athlons and PIII/PIV it becomes a lot less clear
Alansmesjz] Alan_Q: it surprises me that 5 out of demo1 tests
Alanreturns 50 seconds runtime (100k iterations)
Alansmesjz: To find out why you'd really have to look at what was going on more deeply - I don't know either, unless your memory is the real limit
Alan[14-Dec:18:41 rp] can someone please explain exact meaning of 1024 tablesize?
Alanrp: the different runs access data from lookup tables that are 64K, 128K and so on
Alanso 1024K lookup table means the program is simulating random accesses to 1Mbyte of lookup data
Alanif you run this with bigger and bigger sizes eventually the performance becomes about constant. That gives you a good idea that the cache is no longer helping out
AlanThis paticular test is very important for things like image processing. jpeg compression and the like
Alanone of the reasons that MMX is such a help for video processing is that it lets you do a lot of processing at the same time, so you can avoid lookup tables when doing things like colour conversion
Alanridiculum] is there in linux any tool like Vtune (intel) to
Alan         debug thinks like alan is explain?
Alanridiculum: There are two - there is an open source thing called "oprofile", and there is Intel vtune which is expensive and requires a second windows PC and other things
Alanridiculum: you can look at a lot of the statistics because the newer processors have debugging registers
Alanthey allow tools like oprofile to ask the processor "how many cache misses", "how often did you have to wait for data"
Alanand other similar questions.
Alanhttp://oprofile.sourceforge.net/ is the OProfile profiler
AlanSo we've got some simple demonstrations of how important the cache is
Alan[14-Dec:18:48 avoozl] valgrind also might be interesting to look at, the
Alan         latest version also can do cache simulation and show where in a
Alan         program cache misses are occuring
Alanavooz: yes I had forgotten valgrind can do cache simulation too
Alanwere there cpu's without cache? intels 80286?
Alantarzeau: there were a lot. The 286 was almost never faster than the RAM it was attached too - similarly on the Amiga the RAM is almost twice as fast as the processor
AlanThe amiga actually used that trick to give the processor and support chips shared accesss
Alanprocessor and chipset having alternate access
AlanSo what have we learned about the cache and making good use of it
AlanWell - we know that when we get data we get it in chunks so we can put things we use in the same place.
AlanThat helps the processor and also bappens to help virtual memory (when you swap data to disk you do so in 4K chunks so you may as well keep data together for that too)
AlanWe've demonstrated that you want to keep your processing fitting within the cache. One reason Intel sell expensive processors with very large caches is that databases find it hard to do this
Alanso the Xeons and the really expensive pentium-pro with 1Mb caches were good for database work
AlanWe also know that only a certain amount of data at a given alignment can be cached
AlanThe kernel actually uses special memory allocators to try and scatter objects the kernel uses onto different alignments specifically because of this
Alantarzeau] those edo memory sticks were 60ns and 70ns, how ns are
Alan         l1 and l2 caches?
Alantarz: for modern processors Im not actually sure - even on the 486 L2 cache was about 16nS
AlanIf you have an array of objects that are power of two sized you are likely to be getting almost worst possible performance from the caches
Alanso its a useful trick to add a little extra unused space to each block of memroy to pad out the array elements so they cache better
AlanOk time for the next demonstration
Alanhttp://www.linux.org.uk/~alan/Slides/slide7.html
Alan(there isnt a demo3.. I took it out to make the talk ift time better 8))
AlanThis is designed to show how different ways of doing something can have different performance because of the caches
Alanwhat it actually does (adding numbers) is fairly trivial, but its not that unlike real programming examples
AlanThe first run generates a large set of data, and then adds it up. Generating sets of data then processing them, then processing the results is a very common way of programming
Alanbut it can actually give the worst possible behaviour
AlanThe second run we add the data up as we generate it, and get much better performance.
AlanThis is mostly because the first run we end up emptying all the data out of the cache and then loading it back in again
AlanIn the second case because we add as we go the data only ever leaves the cache once
AlanThe final case shows how much of the operation is the actual overhead
AlanWhat this means is that for any large amount of data and computation it is important to work on it in chunks.
AlanEngineers and high performance computing people do this all the time - the GIMP knows about it too
AlanMany things the GIMP does in its filters it does using rectangles of the image rather than applying each change to the entire image one after another
AlandebUgo-] Alan_Q: how much affects cache associativeness in
Alan         general memory performance?
Alandebugo: keeping data in cache makes a real difference to overall performance - mostly on SMP systems, which is where the next few slides go
AlanThere are lots of algorithms for this and the same techniques are actually uses for clusters and beowulfs - only they are trying to minimise messages over ethernet so its much much more important than on a single system
AlanAll of this stuff about caching matters much much more when you have a multiprocessor PC
AlanLess so on the bigger alpha and sparc machines because they have memory systems designed for multiple processors
AlanA dual athlon or dual pentium III/IV however is two processors on the same memory bus
AlanThe 3 demonstrations have already shown that with a single processor the memory performance is not up to the processor
AlanSo a dual processor machine gives us twice the problem
AlanOne of the demonstrations you can do is to run a continuous large memory copy on one processor and time performance of copis on the other - on some dual PC machines the copies being timed will perform at 1/3rd of the speed they run without the other copying loop
Alanifvoid] Alan_Q: won't that change for the Hammer and Itanium 3?
Alanifvoid: hammer lets you attach memory to each processor, the more processors you add the more memory controllers you can add
Alanifvoid: it depends what the cost of that is whether vendors will do it
Alanrp] so does that mean memory performance does not depend only on
Alan         processor but also on bus speed?
Alanrp: yes
Alanunlevel0] <Alan>So a dual processor machine gives us twice the
Alan         problem  :so this explains why we do not get 2x the performance of 1
Alan         porcessor
Alanrunlevel0: there are two reasons you don't get twice the performance
Alanthe first is that you are sharing a memory which is not fast enough
Alanthe second is that there is a cost in stopping the system from doing the wrong two things at the same time
Alanthe kernel has to do real work to stop two people allocating the same memory, using the same disk block and all the other things we dont wish to happen
AlanThe only reason a dual processor PC is usable at all is because most memory accesses are coming from the cache in normal usage
Alaneach processor has its own cache (except some dual pentium machines which are just painful 8))
Alanridiculum] what about hyperthreading and cache coherence?
Alanridiculum: hyperthreading shares the cache between the two execution units on that processor
Alanso you get to do two things at once but each application will suffer more cache misses
Alansh0nX] so this is where spinlocks come in
Alansh0nX: right - thats the main thing the kernel uses to synchronize things internally
AlanOne thing the processors have to do is to ensure that the two processors dont cache different versions of the same data or miss changes the other processors make
Alan docelic] Id appreciate more on spinlocks too
Alandoc: we'll talk about that a bit after the main talk
Alanbzzz] Alan_Q: how pci devices may see data which in cache only?
Alanbzzz: the processors as well as making sure they see each others changes do the same with devices on the PCI bus
Alanbzzz: The standard caching technique is a thing called MESI
AlanThat stands for the four states each piece of the cache can be in
AlanWe have an "M" - or modified state. That means this piece of information is something this processor has changed and that we have data the other processors dont know about yet
AlanWe have an "E - Exclusive state - where we know nobody else has this data but we do
AlanWe have an "S" or shared state, where we know we have a copy of the data but other people also have copies in shared state
Alannobody has it modified
Alanand we have "I" or Invalid - where we don't have a clue what is going on but we know we don't have the data
AlanAt any point two processors cannot have the same data except in shared state
AlanWhen we modify some data we change the state on it - if it ws exclusive it becomes modified
AlanIf it was shared we have to kick each of the other processors and make them get rid of their copies
AlanIf we dont have a copy (I) we must ask for it - this like moving from shared can be quite expensive
AlanIf another processor had a copy in modified state we have to ask that processor to write it back to memory and then read it ourselves
AlanWhat we want to avoid at all costs is having two processors continually modifying the same data
AlanThis turns into a sort of food fight on the memory bus
Alanand we spend most of our time passing data back and forth between the processors
AlanThat doesn't get a lot of work done - and once you want to scale to big computers it becomes very important indeed to avoid it
AlanIBM have been doing a lot of work on kernel code where these kind of fights can occur as they have 16 processor systems
Alanwhich make it very apparent when you get this wrong
Alan[14-Dec:19:17 sh0nX] Alan_Q: so we should be using the cache for SMP
Alan         processors to keep data that isnt going to change much and use the
Alan         processors to handle data that does change often?
AlanshonX: there are systems where it makes sense to have heavily shared data uncached. The way the PC hardware works really stops you doing this
AlanEven if you make it uncached it is still slow
AlanMost of the time this is not a problem - applications dont share a lot of things anyway
AlanThreaded applications tend to share very little data thankfully
AlanWhen you design threaded an SMP applications it is important to minimise the amount of time data spends bouncing between processors
AlanSo for example if you were doing JPEG encoding on a multiprocessor system it would be better to use one processor to do the top half of the image and the second processor to do the bottom half
Alanthan to have one processor do colour conversion and the other processor do the compression pass
Alanwhen it comes to things like mpeg encoding this gets quite tricky
AlanIn addition it is possible to get what is called "false sharing"
Alanh0nX] Alan_Q: so we want to keep both processors doing OTHER
Alan         things
Alanshonx: exactly
Alanshonx: like people processors work best when they are not falling over each other
AlanFalse sharing occurs because the processor cache works in 32 or 64 byte chunks
AlanIf you happen to put two unrelated pieces of data in the same 64 bytes you might accidentally have one thing used by each processor in the same cache line - and start a fight
AlanThus people pad out such structures to make them bigger and avoid this
Alanor they keep them apart
Alan(padding them out avoid sharing but it means you use more cache of course - so you are doing what demo1 said not to do)
Alansh0nX] Alan_Q: so when designing SMP applications, how do we
Alan         tell which processor to handle which data without causing the
Alan         processors to both handle the same data?
Alanshonx: the scheduler tries to keep a given thread running on the same processor as much as possible
Alanso its just a matter of avoiding accessign the same data a lot in two different threads
AlanSimilarly we try and keep a given application running on the same processor so that we dont spend a lot of time copying stuff from one processor to another
Alan[14-Dec:19:25 yalu] Alan_Q: is the scheduler smart enough to keep threads who
Alan         share a lot of data on the same processo
Alanyalu: it makes some simple guesses
Alanbut its actually very hard to measure the real amount of shsring efficiently
Alanespcially since read only sharing (eg of code) is fine
Alanzwane] Alan_Q: All this must get really interesting with
Alan         Hyperthreaded cpus
Alanzwane: There are reasons Ingo is still fiddling with getting the best performance off such processors 8)
AlanWith hyperthreading you sort of have two processors per cache
Alanand the cache has some other odd internal limits too
Alanzwane] Alan_Q: do you reckon scheduler only would suffice? How
Alan         about leveraging cpu affinity for say doing bias in interrupt
Alan         handling?
Alanzwane: There are good arguments in some cases for having a process wake up on the CPU that handled an interrupt. In most cases however it isnt anything like as valuable as you would think
Alanmost of the process data is cached on the cpu that last ran it
AlanMost good I/O devices used DMA - so they wrote to memory themselves and the memory they wrote to they have removed from all the processor caches (since they modified it)
Alanthere are good reasons for sticking interrupts ot specific processors
Alan(if processor 1 has all the data for eth0 cached then why handle the interrupt on processor 2)
Alansh0nX] so, if a program is written for UP, how does the kernel
Alan         scheduler handle its data on two CPUs? or it can't
Alansh0nX: the scheduler can't split up something with only one thresd of execution. It can spread different applications around - so it can run your game on one processor and the X server on the other
Alansarnold] Alan_Q: does linux currently have a mechanism to
Alan         specify that all interrupts should be handled by a specific [set of]
Alan         CPUs?
Alansarnold: it has some stuff that Ingo did, its at the obscure and wonderous end of kernel tuning
Alansh0nX] I see, so we have to use threads in our code in order to
Alan         benifit SMP
Alansh0Nx: or two programs sometimes is just as easy or easier
AlanThere is one last subject for this talk, then we can move onto most of the questions
AlanSomeone asked early on about helping the cache out
AlanOn a modern processor you have instructions like "prefetch" and "prefetchw"
AlanThese allow you to tell the processor you will be needing data in the future
AlanSo instead of getting stuck waiting for data to arrive from memory you can tell the processor in advance
AlanThe big problem with this is you often don't know well in advance which memory you will need
AlanA memory copy is easy - and the Athlon memory copy in Linux actually keeps saying "and prefetch me 320 bytes ahead of this point"
AlanSimilarly things like graphics processing benefit immensely as do programs that use large arrays of data in predictable fashions
Alan(fortran does very well here strangely enough
AlanWe use this in the krnel for memory copies and some times for lists
Alanit is hard to use for lists because memory is so slow you want to say "prefetch me about five or six items ahead"
Alan<translator wait> [prefetch me a translator ;)]
* riel dcc's the crowd some virtual beers
AlanOk translators fingers seem to have caught up
AlanWhat we actually need to make this sorto fthing work is new data structures
Alanone of the common approaches is to have lists which know next/previous but also know 'five items on' and 'five items back'. We don't do this in the kernel currently
Alanbut it may be something we must look at in the future as processors get faster still
AlanThe final useful thing prefetch is used for in the kernel makes use of the Athlon 'prefetchw' which says "I want this data soon, and I will write to it"
Alanunlike prefetch this gets an exclusive copy of the data. We use this for prefetching locks - which is something that is very expensive if it has to go to main memory
AlanIt is very common for a lock structure to belong to another processor and we often know the lock is going to be used so can prefetch it
Alansh0nX] I assume we use some sort of spinlock to prevent another
Alan         processor from prefetching the same data?
Alan2uggh.. lag 8(
Heimymmh...
Heimy19:44 <Alan> sh0nX] I assume we use some sort of spinlock to prevent another
Heimy19:44 <Alan>          processor from prefetching the same data?
Alan2We don't actually lock that
Alan2very occasionally we prefetch it and it is stolen by another cpu then fetched back again
Alan2it happens so rarely it is cheaper not to worry
AlanAh .. back again
Alan2or not as they case may be
Alan2Also if you had a lock for the lock - you would want to prefetch for the prefetch
Alan2and so on repeatedly
Alan2So in the kernel we treat prefetch very much as a hint
Alan2if it does the right hting most times then it is fine..
Alan2Ok that is really the end of the main part of the talk
Alan2hopefully it has given people some ideas of why caches matter
Alan2and a bit about programming with them in mind
Alan2If we can start with on topic questions before we wander off that would be best
rielI guess people should ask the on topic questions in #qc
rielso we can leave #linux moderated for a few more minutes
Alan2sarnold:#qc] Alan2: i've wondered if prefetching cuts memory
Alan2         bandwidth significantly.. have people tested with prefetch config'ed
Alan2         away?
Alan2sarnold: we've done a fair amount of testing. Most of the time prefetching actually helps use memory bandwidth that would otherwise be wasted
Alan2The athlon one was so fine tuned that we broke some VIA chipsets due to a hardware bug though 8)
Alanrene:#qc] Alan2: talk seemd to be about cacheing alone. do
Alan         things like instruction alignment make a lot of difference om modern
Alan         processors?
Alanrene: they matter a bit - it depends on the processor how much. gcc does know how to get these right when you pick a processor type. Normally however it is under 1%
AlanArador:#qc] Alan2: what're the effects of preempt on caching?
Alanarador: the more you switch bewtween tasks the less useful the fache gets
Alans/fache/cache
AlanPre-empt doesn't really make a lot of difference
AlanIt is however why systems designed for a lot of simultaneous users have a lot of cache
Alan[14-Dec:19:55 aka_mc2:#qc] ALAN: do you think that Crusoe processor, Linux
Alan         supported, it will be considered for all these programmation techs??
Alanaka_mc2: Crusoe is very hard to deal with - the system emulates an x86 and it adjusts its emulation according to things it learns at runtime. That means it can learn what seems to need prefetching and many other things a normal processor cannot. How much of that it actually does I don't know.
Alansklav:#qc] i have noticed higher load averages after i use a
Alan         kernel with -03 and or -05
Alansklav: much of that is actually cache related - gcc -O3 and -O5 unrolls loops which makes them use a lot more memory and on modern cpus is a bad thing to do
Alanreally it is a bug in some gcc's that it does this too much
Alanjmgv:#qc] Alan? dont you think a lot of the work about registers
Alan         users and other questions depend of the compiler and that made us
Alan         lose some control about those issues?
Alan2jmgv: true - but do you want to hand optimise one megabyte of code ?
Alan2jmgv: for the krnel we actually write small critical pieces of code in assembler in som cases - things like memcpy for example
Alan2there are other bits where the C is written so that the compiler outputs the right code rather than the obvious way
Alan2Rapiere: If GCC improves one
Alan2         thread cache use, won't this spoils multi-threading interactivity ?
Alan2Rapiere: the scheduler is dealing at a much higher level  - and the decisions it makes which are designed for best cache performance are the right ones anyway fortunately
Alan2sapan:#qc] Alan2: you said "we know that only memory of certain
Alan2         sizes at certain offsets can be cached" could you explain?
Alan2The processor uses parts of the address to indicate which bit of the cache to look in
Alan2To the CPU an address really looks like [Page Number][Cache line][index into cache]
Alan2So the cache always caches on a 64 byte boundary on an athlon
Alan2In addition if we have lots of date with the same cache line number we can only cache two or four of those bits of the data
Alan2the cache can't store any block of data in any place
Alan2Ok shall we go onto more general questions for a bit (Rik when is the next talk scheduled ?)
rielNext talk will be tomorrow at 1800 UTC
Alan2coywolf!jack@210.83.202.168* what do you think windows GUI is
Alan2         far faster than linux GUI?
Alan2coywolf: because they didnt attend my lecture 8)
Alan2coywofl: but you should go try xfce/rox even on a 32Mb PC 8)
Alan2sh0nX:#qc] since we're offtopic now: Alan2: Do you have patch
Alan2         for the amd76x_pm module for 2.5.xx?
Alan2shonx: it shouldnt be very hard to port but I dont think anyone has ported it yet
Alan2shonx: cool
Alan2(sh0nX:#qc] Alan2: im trying to port it right now)
Alan2sapan:#qc] Alan2: I have an iPAQ with familiar running
Alan2         2.4.18-rmk - if I were to optimize things in the kernel/apps in
Alan2         general, what should I be looking at?
Alan2sapan: Im actually not that familar with the ARM internals. The same general things should apply
Alan2sapan: obviously there are other considerations on a handheld too - lack of a disk, power saving etc
Alan2E0x:#qc] Alan2 what is  you prefer procesor ?
Alan2EOx: this varies. I love the raw speed of the Athlon but hate the reliability and the heat problems
Alan2At the moment I am playing with VIA C3/VIA Eden processors - which are quite slow but are designed to be very power efficient - no fan needed
Alan2this makes for very quiet and cheap systems
Alan2plus small boards people can do crazy things with - like put them into old sparc boxes, or  even a gas can
Alan2(www.mini-itx.com)
Alan2ridiculum]  what's your opinion about itanium2? it's better than
Alan2         hammer?
Alan2ridiculm: right now I am better firmly on the hammer
AlanAs to why the athlon reliability is a problem Im not sure - I've had real problems with getting reliable memory on the dual athlon, heat problems and a lot of hardware incompatibility
Alanbut it does go awfully fast once it works
Alanapuigsech:#qc] Alan, on GDT table we can find some nul
Alan         decriptors (not used), ?is that to gain optimization on cache memory
Alan         usage?
Alanapui: actually several of those gaps are because we used to use them for things and wanted to keep some data the same, others have fixed values required by standards, or for windows bug compatibility in the bios - so not the cache this time
Alanrene] (so that CPUS don't trample on ech others cache lines)
Alanrene: we have to space some things out for that
Alanrene: One example is that the kernel has a structure that describes each page of memory
Heimyrene] (de manera que CPU no machaquen las líneas de caché de otros)
AlanVarious people went to great pains to make that structure exactly 64 bytes ona PC
Heimyooops
Alansh0nX] Alan: do you visit #kernelnewbies? :)
Alanshonx: not oten enough - its a really important project
jmgv<davej> folks interested in the prefetching stuff Alan talked about may find the presentation at http://208.15.46.63/events/gdc2002.htm interesting
Alansh0nX] Alan: do you visit #kernelnewbies? :)
Alanshonx: not oten enough - its a really important project
rielok, the questions seem to be slowing down
rielI guess it's time to wrap up the "official" part of this talk
rielI'd like to thank Alan for this interesting talk
rieland I'd like to remind everybody else of the other UMEET lectures we'll still have
jmgvwe thanks alan cox his effots
rielyou can see the full program here http://umeet.uninet.edu/umeet2002/english/prog.eng.html
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
rielclap clap clap clap clap clap clap clap
rielclap clap clap clap clap clap clap clap
Stonclap clap clap clap clap clap clap clap
Stonclap clap clap clap clap clap clap clap
jmgvclap clap clap clap clap clap
rpclap clap clap
jmgvclap clap clap clap clap clap
sh0nXclap clap clap clap clap clap
jmgvclap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
Stonclap clap clap clap clap clap clap clap
sh0nXclap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
angelLuisplas plas plas plas plas plas plas plas plas
sh0nXclap clap clap clap clap clap
jmgvclap clap clap clap clap clap
mulixclap clap clap clap clap clap clap clap
mipshahaha
angelLuisplas plas plas plas plas plas plas plas plas
angelLuisplas plas plas plas plas plas plas plas plas
angelLuisplas plas plas plas plas plas plas plas plas
Stonclap clap clap clap clap clap clap clap
mulixclap clap clap clap clap clap clap clap
mulixclap clap clap clap clap clap clap clap
rpclap clap clap
casanegraclap clap clap
varochoclap clap clap
bit0clap clap clap clap
apuigsechx)
angelLuisplas plas plas plas plas plas plas plas plas
Stonclap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
Stonclap clap clap clap clap clap clap clap
NiXclap clap clas clap clap
Stonclap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
sh0nXclap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
jacoboclap
msclap clap
MJesusclap clap clap clap clap clap clap clap clap clap
jacoboclacp
sarnoldclap clap clap clap clap :))
sarnoldclap clap clap clap clap :))
rpgreat one
angelLuisplas plas plas plas plas plas plas plas plas
sarnoldclap clap clap clap clap :))
bit0plas plas plas
sarnoldclap clap clap clap clap :))
sarnoldclap clap clap clap clap :))
jeffpcclap clap clap clap clap clap clap clap clap clap clap
HPotterplas plas plas
MJesusclap clap clap clap clap clap clap clap clap clap
jeffpcclap clap clap clap clap clap clap clap clap clap clap
jeffpcclap clap clap clap clap clap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
jeffpcclap clap clap clap clap clap clap clap clap clap clap
casanegraclap clap clap
mulix*-* *-* *-* *-* *-* *-*
mulix*-* *-* *-* *-* *-* *-*
angelLuisplas plas plas plas plas plas plas plas plas
Stonclap clap clap clap clap clap clap clap
mulix*-* *-* *-* *-* *-* *-*
_Josh_alan rules!!!
Stonclap clap clap clap clap clap clap clap
Geryongreat talk :)
Stonclap clap clap clap clap clap clap clap
Karinaclap clap clap y mas clap :)
MJesusclap clap clap clap clap clap clap clap clap clap
Chicoplas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
NiXplap plap plap plap plap plap plap plap plap plap plap plap plap plap plap plap
angelLuistorero! bravo!!!!
error27clap clap clap
Baldorclap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
angelLuistorero! bravo!!!!
sh0nXclap clap clap clap clap clap (2 more times)
drizzdclap clap clap clap clap clap clap clap clap
sh0nXclap clap clap clap clap clap (2 more times)
Chicoplas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plasplas plas plas
sh0nXclap clap clap clap clap clap (2 more times)
BorZung plas plas plas plas plas plas plas plas plas
_Yep_thanks
mcpoh what braindead people
angelLuisplas plas plas plas plas plas plas plas plas
ibidclap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
BorZung plas plas plas plas plas plas plas plas plas
casanegraclap clap clap
BorZung plas plas plas plas plas plas plas plas plas
sapanclap clap
EleTROnVIVA Alan !
BorZung plas plas plas plas plas plas plas plas plas
BorZung plas plas plas plas plas plas plas plas plas
BorZung plas plas plas plas plas plas plas plas plas
BorZung plas plas plas plas plas plas plas plas plas
NiXfélicitations!
angelLuis:))
MJesusclap clap clap clap clap clap clap clap clap clap
Baldorclap clap clap clap clap clap clap
ChicoBien
MJesusclap clap clap clap clap clap clap clap clap clap
angelLuisplas plas plas plas plas plas plas plas plas
paskdocelic JUAS JUAS JUAS
EleTROnAlan no Forum Internacional de Software Livre no Brasil 2003
Heimyclap clap clap clap clap clap clap clap clap clap
EleTROnVIVA
Heimy(sorry, I was translating) :-)
mcpsarnold: hehe
coywolf
sarnold... if only alan hadn't had lag problems... i guess NTL hasn't fixed all his problems. :(
sh0nXhehe
Stonerrr
rpis alann coming back
MJesus¡¡Viva Alan!!
rps/alann/Alan
Stondonde esta el ?
sh0nXhey now
casanegranu ce :S
sarnoldI'd like to mention that Milton's Cisco presentation has been replaced by james morris; he will be presenting on the new 2.5 kernel cryptography support
angelLuisse ha perdido los aplausos???
racielgood talk Alan!
mipsEleTROn: cheguei e o cara terminou de falar
mipshahahaha
sh0nX:-)
mipslixo
EleTROn<mips> hauhauaha
mipssó vi a msg agora.
rielMJesus: fast action
mipsq o barbanegra me mando
sarnoldMJesus: nice :)
sh0nXI'd like to thank the UMEET people for getting Alan to speak today :-)
EleTROn<mips> tava tri
angelLuisMJesus: very good!!
sh0nXit was very informative, and I learned a lot more about SMP :)
rpclap for UMEET
rpclap for UMEET
rpclap for UMEET
Chicovery nice, Mª Jesus
mipsEleTROn: não vou morrer por isso =) não morro de amores por esses locos
EleTROnmips: nem eu :)
angelLuishurra for UniNet.edu!!!!!!
angelLuishurra for UniNet.edu!!!!!!
angelLuishurra for UniNet.edu!!!!!!
* riel knows the netmask of the real alan
sh0nXheh
Stonriel: where is Alan ?
angelLuisriel: :))
sh0nXriel: i think it was visible before
* rp does not know netmask of real alan
muliximitation is the sincerest form of flattery
sarnoldSton: probably ping timeout :(
rielSton: at home, probably eating something now
sh0nXbut im not going to mention it
sarnoldmulix: except in the case of coywolf :-/
jacobomulix: it depends on the quality of the imitation ;)
Stonjejeje ok =)
freddyHay alguien de México aquí?
rielhe must be hungry after two hours of presentation
Megatronyo
jacobobye
rphow long was the *full* presentation
freddyNNo eres por casualidad David Limon?
debUgo-talking makes him thirsty?
juanhas the conference finished?
sarnoldrp: about 2.25 hours
debUgo-X)
mipsEleTROn: que é?
bit0juan: yes
rieldebUgo-: dunno about Alan, but it usually works for me ;)
sarnoldjuan: alan's presentation is over, but there is still one more week of uninet presentations. :)
debUgo-heheh
sh0nX:)
*** Zeno (fltak@zeno.student.utwente.nl) Quit (Lost terminal)
Stonpromedio de personas en el canal durante la charla era de 260 personas jejeje xD
Stonnumero tope que vi 280 xD
debUgo-riel: at least that you speak as you type (too)  heh
Megatronfreddy sip
jmgvreally good
HeimyWell.
debUgo-that would be funny
HeimyI dunno if he's thirsty
MJesusand in #redes more than 100 aditional peoples
HeimyBut his wrists should be on pain right now :P
mipshuh
StonMJesus: 123 ;-)
rpwho got Alan to give this talk?
Stonuh 132 :)
rielHeimy: that happened to me, after my presentation
drizzdHeimy: you can tell, hmm?
Heimy:-))
rielHeimy: I just had to go away from the keyboard for a while ;)
Heimydrizzd: Me? Why?
sh0nXriel ;-)
MJesusfor traslartor:
drizzdHeimy: because you had to type as much as did
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
HeimyI only translated half of his presentation :-)
drizzds/as/he
paskits enough?
jmgvrp: umeet got Alan. at umeet dont exist individuals, umeet is a group
MJesustraslator to Spanish: arador, jacobo and heimy (with vizard)
paskclap clop clup
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap
MJesusclap clap clap clap clap clap clap clap clap clap

Generated by irclog2html.pl 2.1 by Jeff Waugh - find it at freshmeat.net!