qc14-alan.txt

--- Log opened Sat Dec 14 19:02:07 2002 tarzeau why did alan remove the "printer is on fire" joke from the kernel? zwane it might be worth people also looking at stuff like http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf docelic how do I ignore join/part messages for the channel in xchat ? riel docelic: set the channel to conference mode riel click on the arrow to the right of the line where you talk obiwan is it ok to ask a question at any time in #qc? riel and click the "Conf" button viZard obiwan, yes, is ok tarzeau how is it in irssi? /set ? sh0nX conf? sh0nX :) sh0nX i dont think xchat 1.8.10 has conf sh0nX ;/ vegai /help ignore on irssi riel sh0nX: it does rene tarzeau: /ignore #linux JOINS PARTS QUITS NICKS docelic excellent, thanks riel sh0nX riel: hmm riel sh0nX: click on the arrow button to the right of the line where you talk tarzeau rene: thanks sh0nX hh there sh0nX found it sh0nX Alan_Q: How do we compare Athlon XP's to MP's with optimization? Since they are very close to being identical processors? rene that 1 minute is grossly optimistic in my case SnakeFooc alo SnakeFooc speak spanish ? lukep sorry, but the speed of light might be the fastest possible Heimy SnakeFooc: if you want to read the lecture in Spanish, please, join #redes riel SnakeFooc: the spanish translation is in #redes (I think) Heimy SnakeFooc: This channel is only for questions choofi riel right muli the macTijn hey riel :) SnakeFooc Heimy oks .. psy Alan_Q, how can we compare Athlon's processors with Pentium's? muli speed of light really is too slow nowdays --- snarfed by the sigmonster d33p can the cache be directly manipulated by a programmer.. I would have thought it wasnt? runlevel0 hy zwane d33p: yes, we can prefetch, forced invalidate etc runlevel0 Will there be logs? Where? THX zwane d33p: as well as arrange data/code to optimally use it fernand0 yes runlevel d33p Alan_Q: okay obiwan Alan_Q : When we compile a program to be optimized for a specific processor, it won't work on lower-end processors (if I'm not mistaken). So if we prepare a binary to be used on many computers (say when making a distro CD), is it always necessary to take the lowest common denominater (i386 perhaps)? Jimzy but since aching technology undergoes changes, is it really smart to program based on how the cache works? since it could change? muli obiwan, yes... here the problem is really asm instruction families riel Jimzy: it will always remain "too small", some things never change SnakeFooc Alan_Q: some recommendation for the novices of linux? muli if the old processor does not support some of the instructions in the binary, because you compiled for a newer cpu, you're out of luck n0b0dY The log will be placed somewhere for people who lost the talk? muli so you need to compile for the lowest common denominator - i386 in the x86 case. manty anybody knows how to ignore joins and parts on ircII, epic or bitchx? obiwan but if we want to run processor-intensive stuff, it would be wise to squeeze every ounch of performance from the latest and greatest processor right? that is the point of optimaztion, if I'm not mistaken riel SnakeFooc: not related to the topic of the lecture, please ask afterwards tarzeau how are these blocks for sparc or powerpc comparing to x86? riel n0b0dY: http://umeet.uninet.edu/ war manty: egrep -vi '(join|part) log :P paranoidd obiwan: I saw some research on interpreting invalid instructions on lower-end processors (of course it's only based in a same family of processors, such as x86, though) init64 constructors may implement a way to make the OS able to change the cache lines policy manty war: :-) n0b0dY riel. thanks rp has anyone got this talk logged? SnakeFooc riel ok riel rp: yes, the log will be published on umeet.uninet.edu later sh0nX rp: yeah as long as ShawnX stays online ;-) sh0nX heh SpyderMan Alan_Q: are the caching algorithms efficient or could they be improved if, say, the kernel could influence cache content? rp thanks safo Quizas el modo de incrementar la velocidad sea en cambiar el sistema de traduccion que tenemos en la gestion de memoria segmentada y paginada? SnakeFooc riel wran whent init64 Alan_Q : some cache have another way to find lines. They use 1 "comparator" per cache line init64 but I guess it's too expensive for big amounts of cache SnakeFooc riel warn whent runlevel0 sorry, any URL where the logs will be, those channel msgs annoys me a lot tarzeau runlevel0: see topic sarnold runlevel0: /topic :) runlevel0 ok THX ;) cybtro I am really sorry... can you repeat the site lo read the logs please? thanks a lot Heimy cybtro: see the topic.. sarnold cybtro: /topic :) cybtro :) cybtro jeje yes you're right thanks * sh0nX looks at demo1 tarzeau how do you run demo1 in single user mode? tarzeau and all other processes killed? tarzeau what about dos? that runs not in protected mode, but linux is in pm tarzeau does it matter? rp i am running demo1 with ./a.out and cannot see anything printed d33p rp: same here ifvoid how large are the P4 L1 and L2 cache? rp d33p wait i got it Alan_Q rp: you may need to wait a while or make the loop smaller for slower processors tarzeau ifvoid: i wonder how's it for sparc/ultrasparc and powerpc runlevel0 [ Alan ] So what demo1 does is much like a large number of perfectly normal applications... buf, I would better not run it right now, compiling the X ;) tarzeau ifvoid: g3 and g4 that is rp Step of 1 across 4K took 80 seconds. rp this is what I got sh0nX Alan_Q: What is the best/average number of hits in (%) the kernel can get with the processor caches? davej ifvoid: http://www.codemonkey.org.uk/x86info/results/Intel/pentium4-northwood-HT.txt Alan_Q rp: make it run 1/0th of the number of times 8) Alan_Q 1/0 -> 1/10 rp Alan_Q How??? sarnold rp: remove one of the zeros from the for loop d33p Alan_Q: which loop do we make smaller? sarnold d33p: (the inner for loop) rp yes I have removed a zero and recompiled rp Step of 1 across 4K took 8 seconds. fets Step of 32 across 128K took 4 seconds. fets Step of 64 across 256K took 10 seconds. fets Step of 128 across 512K took 36 seconds. fets this is a xeon ;-) rp I have AMD k6-II sh0nX Step of 1 across 4K took 29 seconds. sh0nX (in X with KDE) so its gonna take longer Arador sh0nX: what machine? sh0nX Athlon MP 2000+ docelic here, all up to 64k takes 5 secs, then 128: 7 sec, 256: 19 sec sh0nX UP right now sh0nX 2.5.51 sh0nX Step of 2 across 8K took 26 seconds. sh0nX Step of 4 across 16K took 25 seconds. smesjz Step of 1 across 4K took 50 seconds. (p133/2.5.49) :) sh0nX Step of 8 across 32K took 28 seconds. sh0nX Step of 16 across 64K took 26 seconds. sh0nX Step of 32 across 128K took 32 seconds. sh0nX im sure it would be lower in single user mode cored sh0nX do you test the demo2.c program? tarzeau can someoen run these tests on powerpc/sparc/ultrasparc? sh0nX running rp Reference run one took 29 seconds. sh0nX Reference run one took 4 seconds. sh0nX 64K table size took 5 seconds. sh0nX 128K table size took 7 seconds. paranoidd is it possible to use the Linux SMP implementation of Intel multiprocessor as base for an implementation of asynchronous MP? (e.g. using the sound card's DSP) sh0nX 256K table size took 15 seconds. cored shit cored my pentium mmx is slow :( sh0nX 512K table size took 19 seconds. cored i have to change the loop to 100 i think sh0nX if I run this on my Pentium 233MMX sh0nX it'll be much slower smesjz Alan_Q: it surprises me that 5 out of demo1 tests returns 50 seconds runtime (100k iterations) sh0nX 1024K table size took 22 seconds. sh0nX 2048K table size took 22 seconds. sh0nX 4096K table size took 21 seconds. zwane paranoidd: nope sh0nX odd rp can someone please explain exact meaning of 1024 tablesize? Heimy tarzeau: maybe we could use vore for the sparc test. It's doing nothing right now }:) rp i mean tablesize sh0nX 4096 too LESS then 2048?! sh0nX :)) sh0nX took * psy is gone.. autoaway after 15 min [obv/lp] tarzeau Heimy: vore! heimy! that's debian :) Heimy tarzeau: yep O:) :: Join: reynaert (me134.184.49.28) to #qc SnakeFooc Alan_Q: Help SYSTRAN - Internet translation technologies SnakeFooc alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds rp shall I assume that the lookup tables are in main memory SnakeFooc oop |Seifer nas Heimy SnakeFooc Alan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds Heimy ? SnakeFooc sorry Heimy SnakeFooc: erm... Alan_Q snake: they arent cheap here either ifvoid tarzeau: this is demo1 on alpha: ifvoid Step of 1 across 4K took 4 seconds. ifvoid Step of 2 across 8K took 4 seconds. ifvoid Step of 4 across 16K took 4 seconds. ifvoid Step of 8 across 32K took 1 seconds. ifvoid Step of 16 across 64K took 2 seconds. ifvoid Step of 32 across 128K took 8 seconds. Alan_Q its not my PIV 8) ifvoid Step of 64 across 256K took 11 seconds. ifvoid Step of 128 across 512K took 10 seconds. sh0nX ifvoid: in single user? ifvoid sh0nX: no sh0nX in X? psy ifvoid, whats your processor? BorZung Step of 16 across 64K took 19 seconds. BorZung Step of 32 across 128K took 151 seconds. ifvoid psy: ev6 I think SnakeFooc it was to me my keyboard ifvoid but it's a 4-proc machine, with a load of about 2 atm psy mine took 27 secs at all steps ifvoid (I removed one 0 btw) sh0nX Alan_Q: that explains why higher buffers took the same amount of time on my machine. ridiculum is there in linux any tool like Vtune (intel) to debug thinks like alan is explain? psy ifvoid, how? ifvoid psy: ? psy you removed one 0 ifvoid yeah ifvoid from the inner loop psy how did u? psy ah sh0nX oh :) psy ok ridiculum maybe explaning ? (my english it's not good) ifvoid so, multiply the run times by 10 psy i see heh SnakeFooc ridiculum . ke pasa L-i-n-u-X Karina: :**** SnakeFooc Alan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds SnakeFooc L-i-n-u-X !!!!11 runlevel0 SnakeFooc: behave plz SnakeFooc oks ifvoid hmm, fiddling with the compiler options makes a lot of difference for demo1 sh0nX Alan_Q: I always thought having data aligned on 512/1024, etc was a good thing, since those are common alignments and should be easier for the processer to split things into? ifvoid I wonder why sarnold SnakeFooc: alan responded .. he said they are expensive for him too Alan_Q ifvoid: always use -O2 or -O3 for those tests, you want to measure the cpu not the compiler 8) SnakeFooc okssss rp i stilll don't get what *eactly* are lookup tables... sapan Alan_Q: is demo1.c also a statement in favour of implementing limited no.s of svcs, like httpd in the kernel? since then send loops etc. would be much tighter? rp can someone point me to an URL explaining those avoozl valgrind also might be interesting to look at, the latest version also can do cache simulation and show where in a program cache misses are occuring rp please? psy well, i gotta go tarzeau were there cpu's without cache? intels 80286? Heimy rp: a table with precalculated data d33p what about grof that comes with the GNU binutils? psy gonna check the log later, see you d33p s/grof/gprof sarnold tarzeau: 486s were frequently sold without cache to make them cheaper :) rp aLAN SAYS: so you can avoid lookup tables when doing things like colour conversion rp Heimy: what does this mean then? Heimy rp: erm... docelic sarnold: hehe yea, like the "Packard Bell" 486s :)) Heimy rp: Alan told about that just a few minutes ago tarzeau sarnold: aha :) same with the cheap p2 with less cache tarzeau and harddisks... tarzeau i've have one 512mb one without cache (horribly slow) tarzeau i wonder what's that altivec stuff on powerpc's rene 386 also had no onboard cache. SnakeFooc Alan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much zwane tarzeau: you have a cache Heimy rp: Sometimes, if you want to speed up some expensive repetitive calculations, you can make a table of precalculated data tarzeau rene: heck some didn't have a fpu inside Heimy rp: And just look at that table for data rp Heimy: ok rp Heimy: ok...getting it tarzeau rene: didn't they sell separate weitek c(?)pu's? tarzeau zwane: i have a cache? runlevel0 sarnold: I have an old Pentium board with this external 256 k caches rene tarzeau: none did. the fpu got integrated in the 486(dx) sh0nX math co-processors Heimy rp: But looking at that table is sometimes even more expensive than the calculation itself zwane tarzeau: you're thinking of no L2 sh0nX the SX is without co-pro Heimy rp: except if it fits on caché tarzeau but why does cpu access to 16bytes of l1/l2 cache? the registers are 32bit since 386.. and still they are only 32bit on intel tarzeau eax... rp Heimy: and usually the tables are in main memory, right? aka_mc2 QUESTION: ¿¿but what is the alternative ot actual slow cache?? ;) Heimy rp: exactly. If they're on cache memory, then it will be very fast to look up that tables rp Heimy: oh! got it now...thanks SnakeFooc Alan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much sh0nX Please don't repeat SnakeFooc SnakeFooc sh0nX oks :D rene tarzeau: on Pentium (I, MMX, Pro, II, III) the cacheline-line is 32 bytes, not 16. on p4 and athlon it's 64 bytes Ikarus PPRO actually came in 2 MB aswell SnakeFooc sorry rene cache*line* tarzeau those edo memory sticks were 60ns and 70ns, how ns are l1 and l2 caches? tarzeau (72pin thingies) Ikarus tarzeau: iirc, about 12 ns on Pentiums for L2 sh0nX i have 60ns EDO on this P233MMX Ikarus (or atleast on the ones I pulled apart) sh0nX it DOES make a difference tarzeau sh0nX: i do in my sparc classic :) sh0nX heh |Seifer ¿ Es posible que el GCC 3.2 tenga problemas al compilar ciertas cosas ? SnakeFooc Alan_Q my cuestion ? aka_mc2 DDR PC2100 can be 2.0 CAS latency ifvoid |Seifer: could you please repeat the question in english? sh0nX ifvoid: someone can translate it :) SnakeFooc |Seifer io me encargo debUgo- tarzeau: L1 and L2 caches runs at full CPU speed in actual processors, so, do the math ;) Ikarus debUgo-: not true |Seifer SnakeFooc, ok Heimy <|Seifer:> It's possible that GCC 3.2 have problems compiling some things Ikarus debUgo-: on the Pentium they ran at bus speed Heimy ? Heimy erm... That was a question Heimy <|Seifer:> It's possible that GCC 3.2 have problems compiling some things? Heimy Better :) Ikarus debUgo-: and on the Pentium II they ran at 1/2'th CPU speed debUgo- _ACTUAL_ processors, Pentium I is not an actual processor SnakeFooc Alan_Q: It is possible that GCC 3,2 has problems when compiling certain things? |Seifer Heimy, thx Heimy |Seifer: ;) sarnold SnakeFooc: no compiler is perfect sh0nX Run 1 (silly way) took 16 seconds. tarzeau debUgo-: so we loose alot of time between cpu and memory and memory and harddisk! :( umf sh0nX Run 2 (smart way) took 10 seconds. sh0nX Run 3 (live data only) took 5 seconds. ifvoid Run 1 (silly way) took 5 seconds. ifvoid Run 2 (smart way) took 2 seconds. ifvoid Run 3 (live data only) took 0 seconds. sh0nX interesting SnakeFooc sarnold for ? tarzeau i lose most time turning on/off my computer Ikarus ifvoid: it optimised the third run into oblivion ? sh0nX ifvoid: what processor do you have? sh0nX i used no optimization debUgo- tarzeau: harddisk speed really sux =P mcp rootcodeman:[/tmp] # ./demo4 mcp Run 1 (silly way) took 23 seconds. mcp Run 2 (smart way) took 13 seconds. mcp Run 3 (live data only) took 6 seconds. sh0nX Run 1 (silly way) took 5 seconds. sh0nX Run 2 (smart way) took 4 seconds. sh0nX Run 3 (live data only) took 1 seconds. tarzeau debUgo-: ever seen a tapedrive? sh0nX ^^ With optimization ifvoid Ikarus: cc -fast, nothing fancy ifvoid sh0nX: alpha EV6 sh0nX -O3 here Ikarus ifvoid: which cc ? d33p 48,29 and 15 on a p3 1ghz running at 700mhz |Seifer porque tengo problemas al compilar kernel 2.4.3 y 2.4.17 con GCC 3.2 sh0nX Run 1 (silly way) took 4 seconds. sh0nX Run 2 (smart way) took 4 seconds. sh0nX Run 3 (live data only) took 1 seconds. sh0nX gcc demo4.c -O5 -o a rp Run 1 (silly way) took 62 seconds. sh0nX gcc version 3.2.2 20021123 (prerelease) SnakeFooc because I have problems when compiling kernel 2,4,3 and 2,4,17 with GCC 3.2 ifvoid Ikarus: Compaq C V6.4-014 on Compaq Tru64 UNIX V5.1A (Rev. 1885), Compiler Driver V6.4-215 (sys) cc Driver runlevel0 sh0nX -O5 ??? is that much necessary ? tarzeau does rms and linus also give talks on irc? sh0nX i could go higher sh0nX :) jmgv plese let's fix talk issues! debUgo- Alan_Q: how much affects cache associativeness in general memory performance? ridiculum SnakeFooc not all kernels compile with gcc 3.2. i think the oficial compiler is still 2.95 sh0nX -O99 jmgv offtopis at #latertulia thank you rp GIMP wowowwow SnakeFooc oks mcp woohoo, -O5 gives a amazing performance boost sh0nX Run 1 (silly way) took 4 seconds. sh0nX Run 2 (smart way) took 4 seconds. sh0nX Run 3 (live data only) took 1 seconds. sarnold |Seifer: please save that for some other forum; alan is talking about processor optimizations, not fixing compiling problems ;) sh0nX that is the maximum preformance i can get rp Run 2 (smart way) took 40 seconds. rp Run 3 (live data only) took 18 seconds. ridiculum SnakeFooc oficial for kernel compile. for other things you can use gcc 3.2 rp and I have 512Kb cache and still these results riel rp: so your CPU spent 22 seconds playing with memory d33p heh right ojn, compiling with optimisations is seriously skewing those results and the relative difference factor also decreased riel rp: and only 18 seconds on the actual calculation rp riel: 22 sec??? how mcp Run 1 (silly way) took 7 seconds. mcp Run 2 (smart way) took 5 seconds. mcp Run 3 (live data only) took 2 seconds. onki Run 1 (silly way) took 16 seconds. onki Run 2 (smart way) took 13 seconds. onki Run 3 (live data only) took 2 seconds. riel rp: yes, it spends more time waiting for memory than doing something useful riel onki: in your case it spent 7 times as much time waiting for memory as it spent doing real work onki yeah, I see rp was it because I have lot of cache (512) onki I only have 128 fets I have quad xeons, so i'm interested :P SnakeFooc as it is the cause of the problem? yalu I have one of those athlons witrh 512 Kbyte cache... but slow cache yalu athlon classic fets and I'll have some NUMA ibm x440's to play around with next month ifvoid Alan_Q: won't that change for the Hammer and Itanium 3? rp so does that mean memory performance does not depend only on processor but also on bus speed? runlevel0 <Alan>So a dual processor machine gives us twice the problem :so this explains why we do not get 2x the performance of 1 porcessor bzzz Alan_Q: could you describe coherent related problem x86 has to do with cache? debUgo- AFAIK, dual Athlon have dual bus (a bus for each CPU) ridiculum what about hyperthreading and cache coherence? sh0nX so this is where spinlocks come in SnakeFooc Alan_Q: GCC 2,96 of Network Hat 7,1 in RH 8 can be put ? sh0nX ahh ok :) sh0nX thanks Alan :) ifvoid sh0nX: what's a spinlock? E0x si es asi cual seria la ventaja de al final de los servers duales ? docelic Id appreciate more on spinlocks too sarnold ifvoid: a processor just spins waiting for a resource to become available in a "busywait" loop SnakeFooc E0x en ingles jeje SnakeFooc :D bzzz Alan_Q: how pci devices may see data which in cache only? sh0nX :) sh0nX ifvoid: what sarnold said sh0nX heh Arador E0x: what's the advantage of dual servers then? sh0nX spinlocks let the kernel use SMP rp Alan says : One thing the processors have to do is....... rp isn't it the job of OS? sarnold rp: on some systems, yes ridiculum cache coherece it's a hardware problem rp oh! ok Arador what about preempt, can preempt do more cache misess? tarzeau i noticed linux on my sparc classic (50mhz) is horribly slow compared to netbsd (1.6.x) rene "we" have to kick? we as in the OS, or is that hardware-automatic (on x86) tarzeau i think it was a late 2.2.x kernel (of debian), which i compiled myself for the qe, and it took 5 hours ridiculum tarzeau that's true? i have a debian on a sparc classic sarnold rene: on most machines, we the processor rene sarnold: yeah... that's what I thought ridiculum tarzeau ssh it is slow but you can compile with fpu enabled and have more speed tarzeau ridiculum: yeah i noticed it swaps much less with netbsd and generally responds better (i've got x+amiwm and mpg123 on it) tarzeau ridiculum: www.linuks.mine.nu/screenshots/netbsd.png ridiculum tarzeau argg. i haven't X. it's only a mini-server tarzeau ridiculum: you mean with -g8m or something like that? is it really faster? i've used telnet on it for remote x Arador tarzeau: 404 tarzeau oh wait amiwm.png sh0nX Alan_Q: so we should be using the cache for SMP processors to keep data that isnt going to change much and use the processors to handle data that does change often? fets alan: (ibm) this is in the -summit kernels ? rp sh0nX: SMP? sh0nX rp: multiprocessor sarnold rp: Symmettric MultiProcessor rp oh! sh0nX Alan_Q: thats terrible :/ ridiculum tarzeau debian compile all distro with math emulated. it not use fpu. that's horrible Ikarus ridiculum: which is ofcourse not true tarzeau ridiculum: at least they have super cow powers! Ikarus (or atleast as far as I have been able to see) ridiculum Ikarus it's true ifvoid ridiculum: it's not ridiculum Ikarus debian woody tarzeau ridiculum: and a large user base to get help from (unlike rh/suse), www.linuks.mine.nu/debian-worldmap (and lets cut it here before we have a distro war) ifvoid ridiculum: it's just not optimized for specific processors ridiculum Ikarus ssh is very,very slow ifvoid ridiculum: so no MMX, SSL, 3dNOW sarnold ridiculum: ssh uses integer math, not floating point tarzeau ridiculum: only on really slow computers like 486 and other systems under 100mhz ridiculum ifvoid sparc haven't MMX, SSE ;) acme Alan_Q: does any of the standard libjpeg, libtiff, libpng, etc take advantage of SMP in the fashion you described? tarzeau ridiculum: yyes there's sparc v7,8,9 Ikarus ridiculum: hold on, sparc is different, it doesn't have FPU emu in the kernel ridiculum tarzeau sparc system. i talk about sparc sysmtem's sh0nX Alan_Q: so we want to keep both processors doing OTHER things tarzeau ridiculum: and if they'd optimize for one it wouldn't run on the older ones! ridiculum tarzeau debian compile for V7 sh0nX and not tasks that involve the same sort of data tarzeau ridiculum: and people have v7! sh0nX ahh ok :) tarzeau ridiculum: check #sparc on opn/fn Ikarus ridiculum: so to get it to run on ALL sparcs it is all compiled without FPU sh0nX that makes sense. ridiculum tarzeau argg. V7 it's very very old. maybe 10 years old? more? tarzeau ridiculum: it's another thing why linux is slow on sparc classic's not that optimization tarzeau ridiculum: the 2.2.x tree wasn't updated and there's something really badly slow because of something (i just don't remember what) tarzeau ridiculum: my classiccpu has 1991 on it, that's 10 years too ridiculum tarzeau i compile a 2.4.20 and it's ok tarzeau ridiculum: on you sparc classic you've got 2.4.20 ? ridiculum tarzeau yes. on a sparc classic and on a SS5 rene I do not wish to be a bore, and I'm certainly not a moderator, but could we keep this channel for questions to Alan? tarzeau apropos .. what about that mmu-less stuff? can i run linux on my amiga 1200 (standard) one soon? sh0nX Alan_Q: so when designing SMP applications, how do we tell which processor to handle which data without causing the processors to both handle the same data? sh0nX in the kernel we use spinlocks sh0nX but in userland i dont know how that works yalu Alan_Q: is the scheduler smart enough to keep threads who share a lot of data on the same processor? ridiculum sh0nX semaphores sh0nX ridiculum: so threads basically ridiculum sh0nX or process. you can have 2 process (or more) with shared memory zwane Alan_Q: All this must get really interesting with Hyperthreaded cpus bvc does it sound right that i am unable to use protection map from a module? sh0nX we create threads in userland, and then the kernel handles this with its scheduling sh0nX I see now how it fits together Alan_Q sh0nx: yep sh0nX ah :) zwane Alan_Q: do you reckon scheduler only would suffice? How about leveraging cpu affinity for say doing bias in interrupt handling? jacobo Alan_Q: could you please wait a couple of seconds after pasting text, to make the translators a bit happier? ;-) Alan_Q ok sh0nX so, if a program is written for UP, how does the kernel scheduler handle its data on two CPUs? or it can't zwane Alan_Q: so you wouldn't go the RR ioapic interrupt distribution way as in 2.5? Or has this changed muli sh0nX, what does "written for UP" mean? single process? sh0nX yes sh0nX uniprocessor sarnold Alan_Q: does linux currently have a mechanism to specify that all interrupts should be handled by a specific [set of] CPUs? sh0nX I see, so we have to use threads in our code in order to benifit SMP sh0nX benefit even muli sh0nX, threads or multiple processes muli if you have just one thread of execution, nothing can split it up to multiple cpus for you sarnold sh0nX: or just run enough applications on the machine.... sh0nX muli: forked apps? sh0nX I see sarnold Alan_Q: is it worth prefetching the next pointer one is going to follow? riel that list structure will make for an interesting list_add() ;) sh0nX hehehe Alan_Q sarnold : tell me when they recover runlevel0 XD sh0nX we had multiple translators it would speed things up *grin* sh0nX but the problem would be duplciate data being translated sh0nX ;) yalu they share too much data I'm afraid :) sh0nX heh jacobo sh0nX: we do, but we're translating serially Alan_Q sh0nx: you have to share the data carefully sh0nX right. yalu you need an extra scheduler then :) jacobo I was translating before, now Heimy is jacobo the pace was good until sh0nX started to ask ;-) sh0nX sorry ;-) sh0nX SMP is interesting to me sh0nX now all i need is one more processor have SMP :) muli Alan_Q is gcc any good at optimizing for a given cpu's predicted cache usage? sh0nX (soon) Alan_Q muli: its hard for the compiler to do that sh0nX to have SMP even. Alan_Q but gcc doesnt always do a good job 8( grifferz Alan_Q: how applicable are things like prefecth to userland programming without knowing about the hardware? e.g. is it possible that doing some prefecth that speeds up something for a 2 CPU x86 system will actually cause worse performance on a 4 CPU sparc? riel muli: often it cannot do much muli one more question, is there a "lowest common denominator" cache behaviour, or is it possible that optimizing for one cpu will be a pessimization on another? riel muli: because you tell the computer what to do sarnold Alan_Q: ok, i think they are about caught up :) thanks Alan_Q hold them for a minnute.. lets do the last 10 lines of the talk firs t8) sh0nX heh riel muli: for example, a C compiler usually isn't allowed to change your data structures for you grifferz I am impressed that I managed to typo "prefetch" the same way twice muli riel, I was thinking of teaching gcc to do stuff like prefetch by itself viZard Alan_Q, you can continue viZard we´ll catch you up ;-) MJesus traslator are ready, thanks davej muli: There's __builtin_prefetch muli riel, I think the compiler is not allowed to reorder your members, but it's free to pad as it sees fit. muli did anyone measure / think about the effects of kernel preemption on cache usage? rene wouldn't turning lists into "lists of little arrays" (dynamically adding a block when required) help with the prefetching for lists? sh0nX I assume we use some sort of spinlock to prevent another processor from prefetching the same data? Alan_Q uggh lagging * sh0nX is beginning to understand how this all works slowly sh0nX now if i can figure out the kernel API ;) MJesus Alan this is the last in #linux: MJesus [20:36] <Alan> sh0nX] I assume we use some sort of spinlock to prevent another MJesus [20:36] <Alan> processor from prefetching the same data? sh0nX I see Ikarus recursivity, fun sh0nX thats bad sarnold sh0nX: a good resource for more details is a book by curt schimmel, Unix Systems for Modern Architectures: symmetric multiprocessing and caching for kernel programmers sh0nX :) sh0nX sarnold :) riel sarnold: a very nice book, indeed sh0nX sarnold: once i get to understand how 2.5/2.6 works i then I can dive in more sarnold Alan2: i've wondered if prefetching cuts memory bandwidth significantly.. have people tested with prefetch config'ed away? aka_mc2 ALAN: do you know Crusoe processor? it has how amount of cache? tarzeau was this talk announced somewhere? i just heard about it last minute in #debian on opn/fn Ikarus tarzeau: all talk info is on the umeet website jacobo tarzeau: yes rene Alan2: talk seemd to be about cacheing alone. do things like instruction alignment make a lot of difference om modern processors? riel tarzeau: it was also on the LWN calendar riel and a bunch of other places tarzeau riel: oh well if one doesn't read lwn :) Arador Alan2: what're the effects of preempt on caching? docelic so umeet is the place to look for similar events? some other places maybe ? bitland implementation of UMA (United Memory Access, in some Silicon Graphics) in mainboard chipsets could be a better solution for most problems like these? (excuse my bad english) :) sh0nX I always thought we wanted to flag prefetched data as being prefetched already, im surprised another processor will fetch it again. sh0nX even if it doesnt seem like much of a preformance penality doing so sklav Hi guys tarzeau docelic: i can't wait to see rms or linux talk on irc sklav i was wondering what effect using optimization -03 or -05 have on the kernel? tarzeau alan what irc client did you use? bev Blackend sh0nX since Alan mentioned we dont want to work on the same data on both processors prefetching the (same) data seems to contradict this? aka_mc2 ALAN: do you think that Crusoe processor, Linux supported, it will be considered for all these programmation techs?? Arador sklav: AFAIK, -ON where N>3 means -O3....(don't know if it happens nowadays) sklav Arador: by default the kernel itself uses -02 sklav but it can be changed in the Makefile to -03 and so on sklav But im nt sure if this causes other problems sklav Like a performance hit riel docelic: maybe #kernelnewbies could organise some isolated lectures throughout the year sklav i have noticed higher load averages after i use a kernel with -03 and or -05 riel docelic: but as far as I'm concerned, UMEET is the place to cluster a bunch of lectures ;) mulix riel, that would be a great idea mulix like a biweekly or monthly lecture from the kernel guru of your choice :-) sh0nX riel: yes tarzeau mulix: yeah i'd like that too sh0nX riel: I'd especially like to learn how to use PnP on 2.5 ;-) sh0nX hehe tarzeau Alan2: thanks alan, it was a nice talk you gave sarnold docelic: note that uninet hosts other lectures; such as security, ipv6, some for medical doctors, etc.. all sorts. :) Arador sklav: -O3 means that gcc adds some extra inlining that weren't requested, i think sklav ok docelic sarnold: even better jmgv Alan? dont you think a lot of the work about registers users and other questions depend of the compiler and that made us lose some control about those issues? sklav Thanks Arador and Alan aka_mc2 ALAN: there is in a future an alternative for the cache memory? (another system of fast data access can be...) E0x the tecnology of HT represent a avance about this problem .... ? mattam Alan2: prediction's better than having a loop unrolled ? sh0nX aka_mc2: parrallel data cache? ;) sh0nX parallel aka_mc2 ok, gracias shnox Ikarus Alan: do you think a L3 cache shared between SMP processors would give a significant performance benefit sh0nX how how about multiprocessed cache? sh0nX aka_mc2: im just guessing sh0nX a cache that is smart enough to decide whats needed for the processors jmgv i see. but.... aka_mc2 shn0x: xD jmgv i've compiler the same program using gcc and intel compiler, and intel produce a code 30% faster, Ikarus jmgv: it averages out to alot lower then that sh0nX I'd like to see a mini 'processor cache' sklav jmgv: does the intel compiler have the same benefits on an AMD? runlevel0 jmgv: can I use the intel compiler on an athlon ? Ikarus something like 8-10 % jmgv sklav: i know sh0nX one that actually can make decisions with the help of the cpu jmgv i dont doubt abot gcc quality sapan Alan2: you said "we know that only memory of certain sizes at certain offsets can be cached" could you explain? sklav Just curious jmgv Im not a compiling guru by any strech of the imagination sklav ;) sklav Althought i can impress my friends! lol jmgv sklav: :) sapan i c sklav Well to be honest i fnd it very cool that the linux community as whole even does irc sessions like this sh0nX ;) sh0nX heh * sklav Smokes a cigarette as he reads the messages sh0nX you can always expect a huge turnout for Alan's lectures :-) Heimy ;) jacobo yeah rp I cannt find where this talk is logged at the site sklav Im surprised he has found the time to give 1 jacobo I wouldn't expect that for one of mine ;) error27 Alan2: I have heard talk about creating ram with inteligence built in so you can do simple operations to it directly instead of using the cpu. What do you think about that idea? rp Thanks Alan Heimy rp: this is not logged right now sarnold rp: it takes a little while to clean and prettify the logs.. they will be up soon. :) sh0nX hehe rp Heimy: oh! sh0nX since we're offtopic now: Alan2: Do you have patch for the amd76x_pm module for 2.5.xx? sarnold rp: (look in the "congress details" piece...) jacobo rp: you'll be able to find it later in "Congress Details" docelic it would be nice to separate it in a document without community questions (like, to form an article of what Alan said), and then put Q/A section separately Heimy rp: we have to strip some things, and beauty it a bit Heimy rp: :) rp oh! great waiting for that sarnold docelic: it is BorZung the slices docelic .. rp I missed many things...am very new to irc and Linux tarzeau oh my god i've missed half of the talks runlevel0 coywofl: Win is *faster* if you throw it from a 9 stores high building XD sapan Alan2: I have an iPAQ with familiar running 2.4.18-rmk - if I were to optimize things in the kernel/apps in general, what should I be looking at? jacobo rp: you'll find previous ones already there, in case you want to have a look :) rp jacobo: I am just looking sh0nX Alan2: im trying to port it right now sh0nX but insmod is oopsing kernel ;-) docelic BorZung: well .. yea, kind of. E0x Alan2 what is prefer procesor ? sh0nX ec 13 22:22:05 unknown kernel: amd76x_pm: Version amd76x_pmhardware driver0.1.0 sh0nX Dec 13 22:22:05 unknown kernel: amd76x_pm: Initializing southbridge Advanced Micro Dev sh0nX ic AMD-768 [Opus] ACPI sh0nX Dec 13 22:22:05 unknown kernel: Unable to handle kernel NULL pointer dereference at vi sh0nX rtual address 00000026 sh0nX ;/ sh0nX if only i can debug this easier E0x Alan2 what is you prefer procesor ?* sh0nX for some reason sh0nX Dec 13 23:21:19 unknown kernel: Process insmod (pid: 8180, threadinfo=c8b08000 task=db sh0nX 2e0040) sh0nX unless modules is still broken in 2.5.51 sh0nX ;) apuigsech En la tabla GDT nos encontramos muchos decriptotes nulos (no usados), ¿para mejorar el rendimiento en el uso de la cache? sklav Actually i have yet been able to compile 2.5.x sklav I keep getting errors in make modules sarnold no fan needed? ooooh! :) Ikarus sarnold: well, the faster ones do need them sh0nX Alan2: What is the best way to do debugging with the kernel, since i can't use UML on hardware, would a vmware work? or do i have to just keep doing the reboot thing ;) Ikarus but only tiny ones ms same for geode's but they are *SLOW* :o) EleTROn Alan2: speak about Crusoe ridiculum what's your opinion about itanium2? it's better than hammer? sarnold EleTROn: he mentioned them earlier.. debUgo- Alan2: did you assign reliability problems in Athlon systems to the CPU itself, the supporting chipsets, or any other stuff? RaD|Tz Alan2: How about the Duron processors??? BorZung sh0nX botchs? EleTROn ok riel neat sh0nX BorZung: would bosh emulate all hardware? war He Al answering these questions via privmsg or a different channcel? sh0nX botches sapan Alan2: [ignore if OT] I often get a "Machine Check Exception 7" on my Athlon which I can't decode even with dj's mce decoder. Any idea? war s/He/Is sarnold war: #linux war k war thx. davej sapan: mail me the output and I'll take a look. sh0nX BorZung: i thought it only emulates the core x86 only not other things so much rene still, ia64 is a nice architecture sarnold rene: as long as your goal _isn't_ compiles.. :) sapan davej: hey! thanks a ton :) sh0nX davej: I dont see those Exception errors on 2.5.xx anymore. apuigsech Alan, on GDT table we can find some nul decriptors (not used), ¿is that to gain optimization on cache memory usage? rene sarnold: the chip itself, I mean. :-) BorZung sh0nX i just hear about it sh0nX MCE errors davej sh0nX: interesting. sh0nX MCE: The hardware reports a non fatal, correctable incident occured on CPU 0. Bank 0: 9409c00000000136 sh0nX something in later 2.5 must have fixed something rene apuigsech: in 2.4, the gaps between per-cpu TSS/LDT are for caches sh0nX 2.5.24 sh0nX was when i saw an MCE sh0nX but thats a while back :) rene (so that CPUS don't trample on ech others cache lines) apuigsech TSS and LDT is not used on 2.4 Alan ick rene apuigsech: it sure is :-) angelLuis OP to alan: MJesus apuigsech :) runlevel0 >Alan> "...or for windows bug compatibility in the bios", for god's sake, the chips are made thinking about Win bugs XD runlevel0 amazing sh0nX you know it's a good night when you have Alan speaking and a Hockey game tonight on tv :)) davej runlevel0: rumour has it the Pentium IV is model 15 due to a bug in Win NT. sh0nX it don't get better then that :) runlevel0 dajev: X_D runlevel0 my god angelLuis OP to alan2: sarnold, please Alan2 (getitng lots of lag problems) Alan2 I need to vanish very shortly too 8( sh0nX Alan: do you visit #kernelnewbies? :) runlevel0 ok, im going to leave ;) tarzeau and #debian ? runlevel0 proud of meeting you Alan and all the rest runlevel0 ;) sklav Guys im off aswell thanks for the enlightening conversation sh0nX Alan :) sklav Laterzzzzz runlevel0 bye, will get the logs lkater davej folks interested in the prefetching stuff Alan talked about may find the presentation at http://208.15.46.63/events/gdc2002.htm interesting Alan2 oops E0x <Alan2> ridiculm: right now I am better firmly on the hammer < ---- why ? mulix alan, thanks for the talk... it was very interesting. mulix and it was cool to see you on IRC :-) Alan2 eox: its aimed at the mass market, its designed to run x86 software * sh0nX is very pleased with AMD processors (my first one) sh0nX I especially like the MPX chipset bev Alan before you leave... do you have any embedded linux devices doing crazy things in your house? sh0nX hopefully, i can help get some other things implimented from it sarnold (it appears we lost alan...) * NiX has been using an AMD Thunderbird for about three years without any stability problems Heimy <riel> supongo que es hora de cerrar la "parte oficial" de esta charla sh0nX Alan, it was a pleasure seeing you speak today Heimy argh Heimy :) rp Alan is gone??????? sh0nX thank you :) faiku when will the logs be online at umeet site? apuigsech Alan, what do you think about processors in future? the evolution.... jacobo faiku: later today or tomorrow morning jacobo %-) faiku ok.thanks sh0nX he vanished ;) angelLuis plas plas plas plas plas plas plas plas plas war /on ^window "*" { echo $strftime($time() (%I:%M:%S %P)) $1- } Generated by irclog2html.pl 2.1 by Jeff Waugh - find it at freshmeat.net!