--- Log opened Sat Dec 14 19:02:07 2002 |
tarzeau | why did alan remove the "printer is on fire" joke from the kernel? |
zwane | it might be worth people also looking at stuff like http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf |
docelic | how do I ignore join/part messages for the channel in xchat ? |
riel | docelic: set the channel to conference mode |
riel | click on the arrow to the right of the line where you talk |
obiwan | is it ok to ask a question at any time in #qc? |
riel | and click the "Conf" button |
viZard | obiwan, yes, is ok |
tarzeau | how is it in irssi? /set ? |
sh0nX | conf? |
sh0nX | :) |
sh0nX | i dont think xchat 1.8.10 has conf |
sh0nX | ;/ |
vegai | /help ignore on irssi |
riel | sh0nX: it does |
rene | tarzeau: /ignore #linux JOINS PARTS QUITS NICKS |
docelic | excellent, thanks riel |
sh0nX | riel: hmm |
riel | sh0nX: click on the arrow button to the right of the line where you talk |
tarzeau | rene: thanks |
sh0nX | hh there |
sh0nX | found it |
sh0nX | Alan_Q: How do we compare Athlon XP's to MP's with optimization? Since they are very close to being identical processors? |
rene | that 1 minute is grossly optimistic in my case |
SnakeFooc | alo |
SnakeFooc | speak spanish ? |
lukep | sorry, but the speed of light might be the fastest possible |
Heimy | SnakeFooc: if you want to read the lecture in Spanish, please, join #redes |
riel | SnakeFooc: the spanish translation is in #redes (I think) |
Heimy | SnakeFooc: This channel is only for questions |
choofi | riel right |
muli | the |
macTijn | hey riel :) |
SnakeFooc | Heimy oks .. |
psy | Alan_Q, how can we compare Athlon's processors with Pentium's? |
muli | speed of light really is too slow nowdays --- snarfed by the sigmonster |
d33p | can the cache be directly manipulated by a programmer.. I would have thought it wasnt? |
runlevel0 | hy |
zwane | d33p: yes, we can prefetch, forced invalidate etc |
runlevel0 | Will there be logs? Where? THX |
zwane | d33p: as well as arrange data/code to optimally use it |
fernand0 | yes runlevel |
d33p | Alan_Q: okay |
obiwan | Alan_Q : When we compile a program to be optimized for a specific processor, it won't work on lower-end processors (if I'm not mistaken). So if we prepare a binary to be used on many computers (say when making a distro CD), is it always necessary to take the lowest common denominater (i386 perhaps)? |
Jimzy | but since aching technology undergoes changes, is it really smart to program based on how the cache works? since it could change? |
muli | obiwan, yes... here the problem is really asm instruction families |
riel | Jimzy: it will always remain "too small", some things never change |
SnakeFooc | Alan_Q: some recommendation for the novices of linux? |
muli | if the old processor does not support some of the instructions in the binary, because you compiled for a newer cpu, you're out of luck |
n0b0dY | The log will be placed somewhere for people who lost the talk? |
muli | so you need to compile for the lowest common denominator - i386 in the x86 case. |
manty | anybody knows how to ignore joins and parts on ircII, epic or bitchx? |
obiwan | but if we want to run processor-intensive stuff, it would be wise to squeeze every ounch of performance from the latest and greatest processor right? that is the point of optimaztion, if I'm not mistaken |
riel | SnakeFooc: not related to the topic of the lecture, please ask afterwards |
tarzeau | how are these blocks for sparc or powerpc comparing to x86? |
riel | n0b0dY: http://umeet.uninet.edu/ |
war | manty: egrep -vi '(join|part) log :P |
paranoidd | obiwan: I saw some research on interpreting invalid instructions on lower-end processors (of course it's only based in a same family of processors, such as x86, though) |
init64 | constructors may implement a way to make the OS able to change the cache lines policy |
manty | war: :-) |
n0b0dY | riel. thanks |
rp | has anyone got this talk logged? |
SnakeFooc | riel ok |
riel | rp: yes, the log will be published on umeet.uninet.edu later |
sh0nX | rp: yeah as long as ShawnX stays online ;-) |
sh0nX | heh |
SpyderMan | Alan_Q: are the caching algorithms efficient or could they be improved if, say, the kernel could influence cache content? |
rp | thanks |
safo | Quizas el modo de incrementar la velocidad sea en cambiar el sistema de traduccion que tenemos en la gestion de memoria segmentada y paginada? |
SnakeFooc | riel wran whent |
init64 | Alan_Q : some cache have another way to find lines. They use 1 "comparator" per cache line |
init64 | but I guess it's too expensive for big amounts of cache |
SnakeFooc | riel warn whent |
runlevel0 | sorry, any URL where the logs will be, those channel msgs annoys me a lot |
tarzeau | runlevel0: see topic |
sarnold | runlevel0: /topic :) |
runlevel0 | ok THX ;) |
cybtro | I am really sorry... can you repeat the site lo read the logs please? thanks a lot |
Heimy | cybtro: see the topic.. |
sarnold | cybtro: /topic :) |
cybtro | :) |
cybtro | jeje yes you're right thanks |
* sh0nX looks at demo1 |
tarzeau | how do you run demo1 in single user mode? |
tarzeau | and all other processes killed? |
tarzeau | what about dos? that runs not in protected mode, but linux is in pm |
tarzeau | does it matter? |
rp | i am running demo1 with ./a.out and cannot see anything printed |
d33p | rp: same here |
ifvoid | how large are the P4 L1 and L2 cache? |
rp | d33p wait i got it |
Alan_Q | rp: you may need to wait a while or make the loop smaller for slower processors |
tarzeau | ifvoid: i wonder how's it for sparc/ultrasparc and powerpc |
runlevel0 | [ Alan ] So what demo1 does is much like a large number of perfectly normal applications... buf, I would better not run it right now, compiling the X ;) |
tarzeau | ifvoid: g3 and g4 that is |
rp | Step of 1 across 4K took 80 seconds. |
rp | this is what I got |
sh0nX | Alan_Q: What is the best/average number of hits in (%) the kernel can get with the processor caches? |
davej | ifvoid: http://www.codemonkey.org.uk/x86info/results/Intel/pentium4-northwood-HT.txt |
Alan_Q | rp: make it run 1/0th of the number of times 8) |
Alan_Q | 1/0 -> 1/10 |
rp | Alan_Q How??? |
sarnold | rp: remove one of the zeros from the for loop |
d33p | Alan_Q: which loop do we make smaller? |
sarnold | d33p: (the inner for loop) |
rp | yes I have removed a zero and recompiled |
rp | Step of 1 across 4K took 8 seconds. |
fets | Step of 32 across 128K took 4 seconds. |
fets | Step of 64 across 256K took 10 seconds. |
fets | Step of 128 across 512K took 36 seconds. |
fets | this is a xeon ;-) |
rp | I have AMD k6-II |
sh0nX | Step of 1 across 4K took 29 seconds. |
sh0nX | (in X with KDE) so its gonna take longer |
Arador | sh0nX: what machine? |
sh0nX | Athlon MP 2000+ |
docelic | here, all up to 64k takes 5 secs, then 128: 7 sec, 256: 19 sec |
sh0nX | UP right now |
sh0nX | 2.5.51 |
sh0nX | Step of 2 across 8K took 26 seconds. |
sh0nX | Step of 4 across 16K took 25 seconds. |
smesjz | Step of 1 across 4K took 50 seconds. (p133/2.5.49) :) |
sh0nX | Step of 8 across 32K took 28 seconds. |
sh0nX | Step of 16 across 64K took 26 seconds. |
sh0nX | Step of 32 across 128K took 32 seconds. |
sh0nX | im sure it would be lower in single user mode |
cored | sh0nX do you test the demo2.c program? |
tarzeau | can someoen run these tests on powerpc/sparc/ultrasparc? |
sh0nX | running |
rp | Reference run one took 29 seconds. |
sh0nX | Reference run one took 4 seconds. |
sh0nX | 64K table size took 5 seconds. |
sh0nX | 128K table size took 7 seconds. |
paranoidd | is it possible to use the Linux SMP implementation of Intel multiprocessor as base for an implementation of asynchronous MP? (e.g. using the sound card's DSP) |
sh0nX | 256K table size took 15 seconds. |
cored | shit |
cored | my pentium mmx is slow :( |
sh0nX | 512K table size took 19 seconds. |
cored | i have to change the loop to 100 i think |
sh0nX | if I run this on my Pentium 233MMX |
sh0nX | it'll be much slower |
smesjz | Alan_Q: it surprises me that 5 out of demo1 tests returns 50 seconds runtime (100k iterations) |
sh0nX | 1024K table size took 22 seconds. |
sh0nX | 2048K table size took 22 seconds. |
sh0nX | 4096K table size took 21 seconds. |
zwane | paranoidd: nope |
sh0nX | odd |
rp | can someone please explain exact meaning of 1024 tablesize? |
Heimy | tarzeau: maybe we could use vore for the sparc test. It's doing nothing right now }:) |
rp | i mean tablesize |
sh0nX | 4096 too LESS then 2048?! |
sh0nX | :)) |
sh0nX | took |
* psy is gone.. autoaway after 15 min [obv/lp] |
tarzeau | Heimy: vore! heimy! that's debian :) |
Heimy | tarzeau: yep O:) |
:: Join: reynaert (me134.184.49.28) to #qc |
SnakeFooc | Alan_Q: Help SYSTRAN - Internet translation technologies |
SnakeFooc | alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds |
rp | shall I assume that the lookup tables are in main memory |
SnakeFooc | oop |
|Seifer | nas Heimy |
SnakeFooc | Alan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds |
Heimy | ? |
SnakeFooc | sorry |
Heimy | SnakeFooc: erm... |
Alan_Q | snake: they arent cheap here either |
ifvoid | tarzeau: this is demo1 on alpha: |
ifvoid | Step of 1 across 4K took 4 seconds. |
ifvoid | Step of 2 across 8K took 4 seconds. |
ifvoid | Step of 4 across 16K took 4 seconds. |
ifvoid | Step of 8 across 32K took 1 seconds. |
ifvoid | Step of 16 across 64K took 2 seconds. |
ifvoid | Step of 32 across 128K took 8 seconds. |
Alan_Q | its not my PIV 8) |
ifvoid | Step of 64 across 256K took 11 seconds. |
ifvoid | Step of 128 across 512K took 10 seconds. |
sh0nX | ifvoid: in single user? |
ifvoid | sh0nX: no |
sh0nX | in X? |
psy | ifvoid, whats your processor? |
BorZung | Step of 16 across 64K took 19 seconds. |
BorZung | Step of 32 across 128K took 151 seconds. |
ifvoid | psy: ev6 I think |
SnakeFooc | it was to me my keyboard |
ifvoid | but it's a 4-proc machine, with a load of about 2 atm |
psy | mine took 27 secs at all steps |
ifvoid | (I removed one 0 btw) |
sh0nX | Alan_Q: that explains why higher buffers took the same amount of time on my machine. |
ridiculum | is there in linux any tool like Vtune (intel) to debug thinks like alan is explain? |
psy | ifvoid, how? |
ifvoid | psy: ? |
psy | you removed one 0 |
ifvoid | yeah |
ifvoid | from the inner loop |
psy | how did u? |
psy | ah |
sh0nX | oh :) |
psy | ok |
ridiculum | maybe explaning ? (my english it's not good) |
ifvoid | so, multiply the run times by 10 |
psy | i see heh |
SnakeFooc | ridiculum . ke pasa |
L-i-n-u-X | Karina: :**** |
SnakeFooc | Alan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds |
SnakeFooc | L-i-n-u-X !!!!11 |
runlevel0 | SnakeFooc: behave plz |
SnakeFooc | oks |
ifvoid | hmm, fiddling with the compiler options makes a lot of difference for demo1 |
sh0nX | Alan_Q: I always thought having data aligned on 512/1024, etc was a good thing, since those are common alignments and should be easier for the processer to split things into? |
ifvoid | I wonder why |
sarnold | SnakeFooc: alan responded .. he said they are expensive for him too |
Alan_Q | ifvoid: always use -O2 or -O3 for those tests, you want to measure the cpu not the compiler 8) |
SnakeFooc | okssss |
rp | i stilll don't get what *eactly* are lookup tables... |
sapan | Alan_Q: is demo1.c also a statement in favour of implementing limited no.s of svcs, like httpd in the kernel? since then send loops etc. would be much tighter? |
rp | can someone point me to an URL explaining those |
avoozl | valgrind also might be interesting to look at, the latest version also can do cache simulation and show where in a program cache misses are occuring |
rp | please? |
psy | well, i gotta go |
tarzeau | were there cpu's without cache? intels 80286? |
Heimy | rp: a table with precalculated data |
d33p | what about grof that comes with the GNU binutils? |
psy | gonna check the log later, see you |
d33p | s/grof/gprof |
sarnold | tarzeau: 486s were frequently sold without cache to make them cheaper :) |
rp | aLAN SAYS: so you can avoid lookup tables when doing things like colour conversion |
rp | Heimy: what does this mean then? |
Heimy | rp: erm... |
docelic | sarnold: hehe yea, like the "Packard Bell" 486s :)) |
Heimy | rp: Alan told about that just a few minutes ago |
tarzeau | sarnold: aha :) same with the cheap p2 with less cache |
tarzeau | and harddisks... |
tarzeau | i've have one 512mb one without cache (horribly slow) |
tarzeau | i wonder what's that altivec stuff on powerpc's |
rene | 386 also had no onboard cache. |
SnakeFooc | Alan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much |
zwane | tarzeau: you have a cache |
Heimy | rp: Sometimes, if you want to speed up some expensive repetitive calculations, you can make a table of precalculated data |
tarzeau | rene: heck some didn't have a fpu inside |
Heimy | rp: And just look at that table for data |
rp | Heimy: ok |
rp | Heimy: ok...getting it |
tarzeau | rene: didn't they sell separate weitek c(?)pu's? |
tarzeau | zwane: i have a cache? |
runlevel0 | sarnold: I have an old Pentium board with this external 256 k caches |
rene | tarzeau: none did. the fpu got integrated in the 486(dx) |
sh0nX | math co-processors |
Heimy | rp: But looking at that table is sometimes even more expensive than the calculation itself |
zwane | tarzeau: you're thinking of no L2 |
sh0nX | the SX is without co-pro |
Heimy | rp: except if it fits on caché |
tarzeau | but why does cpu access to 16bytes of l1/l2 cache? the registers are 32bit since 386.. and still they are only 32bit on intel |
tarzeau | eax... |
rp | Heimy: and usually the tables are in main memory, right? |
aka_mc2 | QUESTION: ¿¿but what is the alternative ot actual slow cache?? ;) |
Heimy | rp: exactly. If they're on cache memory, then it will be very fast to look up that tables |
rp | Heimy: oh! got it now...thanks |
SnakeFooc | Alan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much |
sh0nX | Please don't repeat SnakeFooc |
SnakeFooc | sh0nX oks :D |
rene | tarzeau: on Pentium (I, MMX, Pro, II, III) the cacheline-line is 32 bytes, not 16. on p4 and athlon it's 64 bytes |
Ikarus | PPRO actually came in 2 MB aswell |
SnakeFooc | sorry |
rene | cache*line* |
tarzeau | those edo memory sticks were 60ns and 70ns, how ns are l1 and l2 caches? |
tarzeau | (72pin thingies) |
Ikarus | tarzeau: iirc, about 12 ns on Pentiums for L2 |
sh0nX | i have 60ns EDO on this P233MMX |
Ikarus | (or atleast on the ones I pulled apart) |
sh0nX | it DOES make a difference |
tarzeau | sh0nX: i do in my sparc classic :) |
sh0nX | heh |
|Seifer | ¿ Es posible que el GCC 3.2 tenga problemas al compilar ciertas cosas ? |
SnakeFooc | Alan_Q my cuestion ? |
aka_mc2 | DDR PC2100 can be 2.0 CAS latency |
ifvoid | |Seifer: could you please repeat the question in english? |
sh0nX | ifvoid: someone can translate it :) |
SnakeFooc | |Seifer io me encargo |
debUgo- | tarzeau: L1 and L2 caches runs at full CPU speed in actual processors, so, do the math ;) |
Ikarus | debUgo-: not true |
|Seifer | SnakeFooc, ok |
Heimy | <|Seifer:> It's possible that GCC 3.2 have problems compiling some things |
Ikarus | debUgo-: on the Pentium they ran at bus speed |
Heimy | ? |
Heimy | erm... That was a question |
Heimy | <|Seifer:> It's possible that GCC 3.2 have problems compiling some things? |
Heimy | Better :) |
Ikarus | debUgo-: and on the Pentium II they ran at 1/2'th CPU speed |
debUgo- | _ACTUAL_ processors, Pentium I is not an actual processor |
SnakeFooc | Alan_Q: It is possible that GCC 3,2 has problems when compiling certain things? |
|Seifer | Heimy, thx |
Heimy | |Seifer: ;) |
sarnold | SnakeFooc: no compiler is perfect |
sh0nX | Run 1 (silly way) took 16 seconds. |
tarzeau | debUgo-: so we loose alot of time between cpu and memory and memory and harddisk! :( umf |
sh0nX | Run 2 (smart way) took 10 seconds. |
sh0nX | Run 3 (live data only) took 5 seconds. |
ifvoid | Run 1 (silly way) took 5 seconds. |
ifvoid | Run 2 (smart way) took 2 seconds. |
ifvoid | Run 3 (live data only) took 0 seconds. |
sh0nX | interesting |
SnakeFooc | sarnold for ? |
tarzeau | i lose most time turning on/off my computer |
Ikarus | ifvoid: it optimised the third run into oblivion ? |
sh0nX | ifvoid: what processor do you have? |
sh0nX | i used no optimization |
debUgo- | tarzeau: harddisk speed really sux =P |
mcp | rootcodeman:[/tmp] # ./demo4 |
mcp | Run 1 (silly way) took 23 seconds. |
mcp | Run 2 (smart way) took 13 seconds. |
mcp | Run 3 (live data only) took 6 seconds. |
sh0nX | Run 1 (silly way) took 5 seconds. |
sh0nX | Run 2 (smart way) took 4 seconds. |
sh0nX | Run 3 (live data only) took 1 seconds. |
tarzeau | debUgo-: ever seen a tapedrive? |
sh0nX | ^^ With optimization |
ifvoid | Ikarus: cc -fast, nothing fancy |
ifvoid | sh0nX: alpha EV6 |
sh0nX | -O3 here |
Ikarus | ifvoid: which cc ? |
d33p | 48,29 and 15 on a p3 1ghz running at 700mhz |
|Seifer | porque tengo problemas al compilar kernel 2.4.3 y 2.4.17 con GCC 3.2 |
sh0nX | Run 1 (silly way) took 4 seconds. |
sh0nX | Run 2 (smart way) took 4 seconds. |
sh0nX | Run 3 (live data only) took 1 seconds. |
sh0nX | gcc demo4.c -O5 -o a |
rp | Run 1 (silly way) took 62 seconds. |
sh0nX | gcc version 3.2.2 20021123 (prerelease) |
SnakeFooc | because I have problems when compiling kernel 2,4,3 and 2,4,17 with GCC 3.2 |
ifvoid | Ikarus: Compaq C V6.4-014 on Compaq Tru64 UNIX V5.1A (Rev. 1885), Compiler Driver V6.4-215 (sys) cc Driver |
runlevel0 | sh0nX -O5 ??? is that much necessary ? |
tarzeau | does rms and linus also give talks on irc? |
sh0nX | i could go higher |
sh0nX | :) |
jmgv | plese let's fix talk issues! |
debUgo- | Alan_Q: how much affects cache associativeness in general memory performance? |
ridiculum | SnakeFooc not all kernels compile with gcc 3.2. i think the oficial compiler is still 2.95 |
sh0nX | -O99 |
jmgv | offtopis at #latertulia thank you |
rp | GIMP wowowwow |
SnakeFooc | oks |
mcp | woohoo, -O5 gives a amazing performance boost |
sh0nX | Run 1 (silly way) took 4 seconds. |
sh0nX | Run 2 (smart way) took 4 seconds. |
sh0nX | Run 3 (live data only) took 1 seconds. |
sarnold | |Seifer: please save that for some other forum; alan is talking about processor optimizations, not fixing compiling problems ;) |
sh0nX | that is the maximum preformance i can get |
rp | Run 2 (smart way) took 40 seconds. |
rp | Run 3 (live data only) took 18 seconds. |
ridiculum | SnakeFooc oficial for kernel compile. for other things you can use gcc 3.2 |
rp | and I have 512Kb cache and still these results |
riel | rp: so your CPU spent 22 seconds playing with memory |
d33p | heh right ojn, compiling with optimisations is seriously skewing those results and the relative difference factor also decreased |
riel | rp: and only 18 seconds on the actual calculation |
rp | riel: 22 sec??? how |
mcp | Run 1 (silly way) took 7 seconds. |
mcp | Run 2 (smart way) took 5 seconds. |
mcp | Run 3 (live data only) took 2 seconds. |
onki | Run 1 (silly way) took 16 seconds. |
onki | Run 2 (smart way) took 13 seconds. |
onki | Run 3 (live data only) took 2 seconds. |
riel | rp: yes, it spends more time waiting for memory than doing something useful |
riel | onki: in your case it spent 7 times as much time waiting for memory as it spent doing real work |
onki | yeah, I see |
rp | was it because I have lot of cache (512) |
onki | I only have 128 |
fets | I have quad xeons, so i'm interested :P |
SnakeFooc | as it is the cause of the problem? |
yalu | I have one of those athlons witrh 512 Kbyte cache... but slow cache |
yalu | athlon classic |
fets | and I'll have some NUMA ibm x440's to play around with next month |
ifvoid | Alan_Q: won't that change for the Hammer and Itanium 3? |
rp | so does that mean memory performance does not depend only on processor but also on bus speed? |
runlevel0 | <Alan>So a dual processor machine gives us twice the problem :so this explains why we do not get 2x the performance of 1 porcessor |
bzzz | Alan_Q: could you describe coherent related problem x86 has to do with cache? |
debUgo- | AFAIK, dual Athlon have dual bus (a bus for each CPU) |
ridiculum | what about hyperthreading and cache coherence? |
sh0nX | so this is where spinlocks come in |
SnakeFooc | Alan_Q: GCC 2,96 of Network Hat 7,1 in RH 8 can be put ? |
sh0nX | ahh ok :) |
sh0nX | thanks Alan :) |
ifvoid | sh0nX: what's a spinlock? |
E0x | si es asi cual seria la ventaja de al final de los servers duales ? |
docelic | Id appreciate more on spinlocks too |
sarnold | ifvoid: a processor just spins waiting for a resource to become available in a "busywait" loop |
SnakeFooc | E0x en ingles jeje |
SnakeFooc | :D |
bzzz | Alan_Q: how pci devices may see data which in cache only? |
sh0nX | :) |
sh0nX | ifvoid: what sarnold said |
sh0nX | heh |
Arador | E0x: what's the advantage of dual servers then? |
sh0nX | spinlocks let the kernel use SMP |
rp | Alan says : One thing the processors have to do is....... |
rp | isn't it the job of OS? |
sarnold | rp: on some systems, yes |
ridiculum | cache coherece it's a hardware problem |
rp | oh! ok |
Arador | what about preempt, can preempt do more cache misess? |
tarzeau | i noticed linux on my sparc classic (50mhz) is horribly slow compared to netbsd (1.6.x) |
rene | "we" have to kick? we as in the OS, or is that hardware-automatic (on x86) |
tarzeau | i think it was a late 2.2.x kernel (of debian), which i compiled myself for the qe, and it took 5 hours |
ridiculum | tarzeau that's true? i have a debian on a sparc classic |
sarnold | rene: on most machines, we the processor |
rene | sarnold: yeah... that's what I thought |
ridiculum | tarzeau ssh it is slow but you can compile with fpu enabled and have more speed |
tarzeau | ridiculum: yeah i noticed it swaps much less with netbsd and generally responds better (i've got x+amiwm and mpg123 on it) |
tarzeau | ridiculum: www.linuks.mine.nu/screenshots/netbsd.png |
ridiculum | tarzeau argg. i haven't X. it's only a mini-server |
tarzeau | ridiculum: you mean with -g8m or something like that? is it really faster? i've used telnet on it for remote x |
Arador | tarzeau: 404 |
tarzeau | oh wait amiwm.png |
sh0nX | Alan_Q: so we should be using the cache for SMP processors to keep data that isnt going to change much and use the processors to handle data that does change often? |
fets | alan: (ibm) this is in the -summit kernels ? |
rp | sh0nX: SMP? |
sh0nX | rp: multiprocessor |
sarnold | rp: Symmettric MultiProcessor |
rp | oh! |
sh0nX | Alan_Q: thats terrible :/ |
ridiculum | tarzeau debian compile all distro with math emulated. it not use fpu. that's horrible |
Ikarus | ridiculum: which is ofcourse not true |
tarzeau | ridiculum: at least they have super cow powers! |
Ikarus | (or atleast as far as I have been able to see) |
ridiculum | Ikarus it's true |
ifvoid | ridiculum: it's not |
ridiculum | Ikarus debian woody |
tarzeau | ridiculum: and a large user base to get help from (unlike rh/suse), www.linuks.mine.nu/debian-worldmap (and lets cut it here before we have a distro war) |
ifvoid | ridiculum: it's just not optimized for specific processors |
ridiculum | Ikarus ssh is very,very slow |
ifvoid | ridiculum: so no MMX, SSL, 3dNOW |
sarnold | ridiculum: ssh uses integer math, not floating point |
tarzeau | ridiculum: only on really slow computers like 486 and other systems under 100mhz |
ridiculum | ifvoid sparc haven't MMX, SSE ;) |
acme | Alan_Q: does any of the standard libjpeg, libtiff, libpng, etc take advantage of SMP in the fashion you described? |
tarzeau | ridiculum: yyes there's sparc v7,8,9 |
Ikarus | ridiculum: hold on, sparc is different, it doesn't have FPU emu in the kernel |
ridiculum | tarzeau sparc system. i talk about sparc sysmtem's |
sh0nX | Alan_Q: so we want to keep both processors doing OTHER things |
tarzeau | ridiculum: and if they'd optimize for one it wouldn't run on the older ones! |
ridiculum | tarzeau debian compile for V7 |
sh0nX | and not tasks that involve the same sort of data |
tarzeau | ridiculum: and people have v7! |
sh0nX | ahh ok :) |
tarzeau | ridiculum: check #sparc on opn/fn |
Ikarus | ridiculum: so to get it to run on ALL sparcs it is all compiled without FPU |
sh0nX | that makes sense. |
ridiculum | tarzeau argg. V7 it's very very old. maybe 10 years old? more? |
tarzeau | ridiculum: it's another thing why linux is slow on sparc classic's not that optimization |
tarzeau | ridiculum: the 2.2.x tree wasn't updated and there's something really badly slow because of something (i just don't remember what) |
tarzeau | ridiculum: my classiccpu has 1991 on it, that's 10 years too |
ridiculum | tarzeau i compile a 2.4.20 and it's ok |
tarzeau | ridiculum: on you sparc classic you've got 2.4.20 ? |
ridiculum | tarzeau yes. on a sparc classic and on a SS5 |
rene | I do not wish to be a bore, and I'm certainly not a moderator, but could we keep this channel for questions to Alan? |
tarzeau | apropos .. what about that mmu-less stuff? can i run linux on my amiga 1200 (standard) one soon? |
sh0nX | Alan_Q: so when designing SMP applications, how do we tell which processor to handle which data without causing the processors to both handle the same data? |
sh0nX | in the kernel we use spinlocks |
sh0nX | but in userland i dont know how that works |
yalu | Alan_Q: is the scheduler smart enough to keep threads who share a lot of data on the same processor? |
ridiculum | sh0nX semaphores |
sh0nX | ridiculum: so threads basically |
ridiculum | sh0nX or process. you can have 2 process (or more) with shared memory |
zwane | Alan_Q: All this must get really interesting with Hyperthreaded cpus |
bvc | does it sound right that i am unable to use protection map from a module? |
sh0nX | we create threads in userland, and then the kernel handles this with its scheduling |
sh0nX | I see now how it fits together |
Alan_Q | sh0nx: yep |
sh0nX | ah :) |
zwane | Alan_Q: do you reckon scheduler only would suffice? How about leveraging cpu affinity for say doing bias in interrupt handling? |
jacobo | Alan_Q: could you please wait a couple of seconds after pasting text, to make the translators a bit happier? ;-) |
Alan_Q | ok |
sh0nX | so, if a program is written for UP, how does the kernel scheduler handle its data on two CPUs? or it can't |
zwane | Alan_Q: so you wouldn't go the RR ioapic interrupt distribution way as in 2.5? Or has this changed |
muli | sh0nX, what does "written for UP" mean? single process? |
sh0nX | yes |
sh0nX | uniprocessor |
sarnold | Alan_Q: does linux currently have a mechanism to specify that all interrupts should be handled by a specific [set of] CPUs? |
sh0nX | I see, so we have to use threads in our code in order to benifit SMP |
sh0nX | benefit even |
muli | sh0nX, threads or multiple processes |
muli | if you have just one thread of execution, nothing can split it up to multiple cpus for you |
sarnold | sh0nX: or just run enough applications on the machine.... |
sh0nX | muli: forked apps? |
sh0nX | I see |
sarnold | Alan_Q: is it worth prefetching the next pointer one is going to follow? |
riel | that list structure will make for an interesting list_add() ;) |
sh0nX | hehehe |
Alan_Q | sarnold : tell me when they recover |
runlevel0 | XD |
sh0nX | we had multiple translators it would speed things up *grin* |
sh0nX | but the problem would be duplciate data being translated |
sh0nX | ;) |
yalu | they share too much data I'm afraid :) |
sh0nX | heh |
jacobo | sh0nX: we do, but we're translating serially |
Alan_Q | sh0nx: you have to share the data carefully |
sh0nX | right. |
yalu | you need an extra scheduler then :) |
jacobo | I was translating before, now Heimy is |
jacobo | the pace was good until sh0nX started to ask ;-) |
sh0nX | sorry ;-) |
sh0nX | SMP is interesting to me |
sh0nX | now all i need is one more processor have SMP :) |
muli | Alan_Q is gcc any good at optimizing for a given cpu's predicted cache usage? |
sh0nX | (soon) |
Alan_Q | muli: its hard for the compiler to do that |
sh0nX | to have SMP even. |
Alan_Q | but gcc doesnt always do a good job 8( |
grifferz | Alan_Q: how applicable are things like prefecth to userland programming without knowing about the hardware? e.g. is it possible that doing some prefecth that speeds up something for a 2 CPU x86 system will actually cause worse performance on a 4 CPU sparc? |
riel | muli: often it cannot do much |
muli | one more question, is there a "lowest common denominator" cache behaviour, or is it possible that optimizing for one cpu will be a pessimization on another? |
riel | muli: because you tell the computer what to do |
sarnold | Alan_Q: ok, i think they are about caught up :) thanks |
Alan_Q | hold them for a minnute.. lets do the last 10 lines of the talk firs t8) |
sh0nX | heh |
riel | muli: for example, a C compiler usually isn't allowed to change your data structures for you |
grifferz | I am impressed that I managed to typo "prefetch" the same way twice |
muli | riel, I was thinking of teaching gcc to do stuff like prefetch by itself |
viZard | Alan_Q, you can continue |
viZard | we´ll catch you up ;-) |
MJesus | traslator are ready, thanks |
davej | muli: There's __builtin_prefetch |
muli | riel, I think the compiler is not allowed to reorder your members, but it's free to pad as it sees fit. |
muli | did anyone measure / think about the effects of kernel preemption on cache usage? |
rene | wouldn't turning lists into "lists of little arrays" (dynamically adding a block when required) help with the prefetching for lists? |
sh0nX | I assume we use some sort of spinlock to prevent another processor from prefetching the same data? |
Alan_Q | uggh lagging |
* sh0nX is beginning to understand how this all works slowly |
sh0nX | now if i can figure out the kernel API ;) |
MJesus | Alan this is the last in #linux: |
MJesus | [20:36] <Alan> sh0nX] I assume we use some sort of spinlock to prevent another |
MJesus | [20:36] <Alan> processor from prefetching the same data? |
sh0nX | I see |
Ikarus | recursivity, fun |
sh0nX | thats bad |
sarnold | sh0nX: a good resource for more details is a book by curt schimmel, Unix Systems for Modern Architectures: symmetric multiprocessing and caching for kernel programmers |
sh0nX | :) |
sh0nX | sarnold :) |
riel | sarnold: a very nice book, indeed |
sh0nX | sarnold: once i get to understand how 2.5/2.6 works i then I can dive in more |
sarnold | Alan2: i've wondered if prefetching cuts memory bandwidth significantly.. have people tested with prefetch config'ed away? |
aka_mc2 | ALAN: do you know Crusoe processor? it has how amount of cache? |
tarzeau | was this talk announced somewhere? i just heard about it last minute in #debian on opn/fn |
Ikarus | tarzeau: all talk info is on the umeet website |
jacobo | tarzeau: yes |
rene | Alan2: talk seemd to be about cacheing alone. do things like instruction alignment make a lot of difference om modern processors? |
riel | tarzeau: it was also on the LWN calendar |
riel | and a bunch of other places |
tarzeau | riel: oh well if one doesn't read lwn :) |
Arador | Alan2: what're the effects of preempt on caching? |
docelic | so umeet is the place to look for similar events? some other places maybe ? |
bitland | implementation of UMA (United Memory Access, in some Silicon Graphics) in mainboard chipsets could be a better solution for most problems like these? (excuse my bad english) :) |
sh0nX | I always thought we wanted to flag prefetched data as being prefetched already, im surprised another processor will fetch it again. |
sh0nX | even if it doesnt seem like much of a preformance penality doing so |
sklav | Hi guys |
tarzeau | docelic: i can't wait to see rms or linux talk on irc |
sklav | i was wondering what effect using optimization -03 or -05 have on the kernel? |
tarzeau | alan what irc client did you use? |
bev | Blackend |
sh0nX | since Alan mentioned we dont want to work on the same data on both processors prefetching the (same) data seems to contradict this? |
aka_mc2 | ALAN: do you think that Crusoe processor, Linux supported, it will be considered for all these programmation techs?? |
Arador | sklav: AFAIK, -ON where N>3 means -O3....(don't know if it happens nowadays) |
sklav | Arador: by default the kernel itself uses -02 |
sklav | but it can be changed in the Makefile to -03 and so on |
sklav | But im nt sure if this causes other problems |
sklav | Like a performance hit |
riel | docelic: maybe #kernelnewbies could organise some isolated lectures throughout the year |
sklav | i have noticed higher load averages after i use a kernel with -03 and or -05 |
riel | docelic: but as far as I'm concerned, UMEET is the place to cluster a bunch of lectures ;) |
mulix | riel, that would be a great idea |
mulix | like a biweekly or monthly lecture from the kernel guru of your choice :-) |
sh0nX | riel: yes |
tarzeau | mulix: yeah i'd like that too |
sh0nX | riel: I'd especially like to learn how to use PnP on 2.5 ;-) |
sh0nX | hehe |
tarzeau | Alan2: thanks alan, it was a nice talk you gave |
sarnold | docelic: note that uninet hosts other lectures; such as security, ipv6, some for medical doctors, etc.. all sorts. :) |
Arador | sklav: -O3 means that gcc adds some extra inlining that weren't requested, i think |
sklav | ok |
docelic | sarnold: even better |
jmgv | Alan? dont you think a lot of the work about registers users and other questions depend of the compiler and that made us lose some control about those issues? |
sklav | Thanks Arador and Alan |
aka_mc2 | ALAN: there is in a future an alternative for the cache memory? (another system of fast data access can be...) |
E0x | the tecnology of HT represent a avance about this problem .... ? |
mattam | Alan2: prediction's better than having a loop unrolled ? |
sh0nX | aka_mc2: parrallel data cache? ;) |
sh0nX | parallel |
aka_mc2 | ok, gracias shnox |
Ikarus | Alan: do you think a L3 cache shared between SMP processors would give a significant performance benefit |
sh0nX | how how about multiprocessed cache? |
sh0nX | aka_mc2: im just guessing |
sh0nX | a cache that is smart enough to decide whats needed for the processors |
jmgv | i see. but.... |
aka_mc2 | shn0x: xD |
jmgv | i've compiler the same program using gcc and intel compiler, and intel produce a code 30% faster, |
Ikarus | jmgv: it averages out to alot lower then that |
sh0nX | I'd like to see a mini 'processor cache' |
sklav | jmgv: does the intel compiler have the same benefits on an AMD? |
runlevel0 | jmgv: can I use the intel compiler on an athlon ? |
Ikarus | something like 8-10 % |
jmgv | sklav: i know |
sh0nX | one that actually can make decisions with the help of the cpu |
jmgv | i dont doubt abot gcc quality |
sapan | Alan2: you said "we know that only memory of certain sizes at certain offsets can be cached" could you explain? |
sklav | Just curious jmgv Im not a compiling guru by any strech of the imagination |
sklav | ;) |
sklav | Althought i can impress my friends! lol |
jmgv | sklav: :) |
sapan | i c |
sklav | Well to be honest i fnd it very cool that the linux community as whole even does irc sessions like this |
sh0nX | ;) |
sh0nX | heh |
* sklav Smokes a cigarette as he reads the messages |
sh0nX | you can always expect a huge turnout for Alan's lectures :-) |
Heimy | ;) |
jacobo | yeah |
rp | I cannt find where this talk is logged at the site |
sklav | Im surprised he has found the time to give 1 |
jacobo | I wouldn't expect that for one of mine ;) |
error27 | Alan2: I have heard talk about creating ram with inteligence built in so you can do simple operations to it directly instead of using the cpu. What do you think about that idea? |
rp | Thanks Alan |
Heimy | rp: this is not logged right now |
sarnold | rp: it takes a little while to clean and prettify the logs.. they will be up soon. :) |
sh0nX | hehe |
rp | Heimy: oh! |
sh0nX | since we're offtopic now: Alan2: Do you have patch for the amd76x_pm module for 2.5.xx? |
sarnold | rp: (look in the "congress details" piece...) |
jacobo | rp: you'll be able to find it later in "Congress Details" |
docelic | it would be nice to separate it in a document without community questions (like, to form an article of what Alan said), and then put Q/A section separately |
Heimy | rp: we have to strip some things, and beauty it a bit |
Heimy | rp: :) |
rp | oh! great waiting for that |
sarnold | docelic: it is |
BorZung | the slices docelic .. |
rp | I missed many things...am very new to irc and Linux |
tarzeau | oh my god i've missed half of the talks |
runlevel0 | coywofl: Win is *faster* if you throw it from a 9 stores high building XD |
sapan | Alan2: I have an iPAQ with familiar running 2.4.18-rmk - if I were to optimize things in the kernel/apps in general, what should I be looking at? |
jacobo | rp: you'll find previous ones already there, in case you want to have a look :) |
rp | jacobo: I am just looking |
sh0nX | Alan2: im trying to port it right now |
sh0nX | but insmod is oopsing kernel ;-) |
docelic | BorZung: well .. yea, kind of. |
E0x | Alan2 what is prefer procesor ? |
sh0nX | ec 13 22:22:05 unknown kernel: amd76x_pm: Version amd76x_pmhardware driver0.1.0 |
sh0nX | Dec 13 22:22:05 unknown kernel: amd76x_pm: Initializing southbridge Advanced Micro Dev |
sh0nX | ic AMD-768 [Opus] ACPI |
sh0nX | Dec 13 22:22:05 unknown kernel: Unable to handle kernel NULL pointer dereference at vi |
sh0nX | rtual address 00000026 |
sh0nX | ;/ |
sh0nX | if only i can debug this easier |
E0x | Alan2 what is you prefer procesor ?* |
sh0nX | for some reason |
sh0nX | Dec 13 23:21:19 unknown kernel: Process insmod (pid: 8180, threadinfo=c8b08000 task=db |
sh0nX | 2e0040) |
sh0nX | unless modules is still broken in 2.5.51 |
sh0nX | ;) |
apuigsech | En la tabla GDT nos encontramos muchos decriptotes nulos (no usados), ¿para mejorar el rendimiento en el uso de la cache? |
sklav | Actually i have yet been able to compile 2.5.x |
sklav | I keep getting errors in make modules |
sarnold | no fan needed? ooooh! :) |
Ikarus | sarnold: well, the faster ones do need them |
sh0nX | Alan2: What is the best way to do debugging with the kernel, since i can't use UML on hardware, would a vmware work? or do i have to just keep doing the reboot thing ;) |
Ikarus | but only tiny ones |
ms | same for geode's but they are *SLOW* :o) |
EleTROn | Alan2: speak about Crusoe |
ridiculum | what's your opinion about itanium2? it's better than hammer? |
sarnold | EleTROn: he mentioned them earlier.. |
debUgo- | Alan2: did you assign reliability problems in Athlon systems to the CPU itself, the supporting chipsets, or any other stuff? |
RaD|Tz | Alan2: How about the Duron processors??? |
BorZung | sh0nX botchs? |
EleTROn | ok |
riel | neat |
sh0nX | BorZung: would bosh emulate all hardware? |
war | He Al answering these questions via privmsg or a different channcel? |
sh0nX | botches |
sapan | Alan2: [ignore if OT] I often get a "Machine Check Exception 7" on my Athlon which I can't decode even with dj's mce decoder. Any idea? |
war | s/He/Is |
sarnold | war: #linux |
war | k |
war | thx. |
davej | sapan: mail me the output and I'll take a look. |
sh0nX | BorZung: i thought it only emulates the core x86 only not other things so much |
rene | still, ia64 is a nice architecture |
sarnold | rene: as long as your goal _isn't_ compiles.. :) |
sapan | davej: hey! thanks a ton :) |
sh0nX | davej: I dont see those Exception errors on 2.5.xx anymore. |
apuigsech | Alan, on GDT table we can find some nul decriptors (not used), ¿is that to gain optimization on cache memory usage? |
rene | sarnold: the chip itself, I mean. :-) |
BorZung | sh0nX i just hear about it |
sh0nX | MCE errors |
davej | sh0nX: interesting. |
sh0nX | MCE: The hardware reports a non fatal, correctable incident occured on CPU 0. Bank 0: 9409c00000000136 |
sh0nX | something in later 2.5 must have fixed something |
rene | apuigsech: in 2.4, the gaps between per-cpu TSS/LDT are for caches |
sh0nX | 2.5.24 |
sh0nX | was when i saw an MCE |
sh0nX | but thats a while back :) |
rene | (so that CPUS don't trample on ech others cache lines) |
apuigsech | TSS and LDT is not used on 2.4 |
Alan | ick |
rene | apuigsech: it sure is :-) |
angelLuis | OP to alan: MJesus |
apuigsech | :) |
runlevel0 | >Alan> "...or for windows bug compatibility in the bios", for god's sake, the chips are made thinking about Win bugs XD |
runlevel0 | amazing |
sh0nX | you know it's a good night when you have Alan speaking and a Hockey game tonight on tv :)) |
davej | runlevel0: rumour has it the Pentium IV is model 15 due to a bug in Win NT. |
sh0nX | it don't get better then that :) |
runlevel0 | dajev: X_D |
runlevel0 | my god |
angelLuis | OP to alan2: sarnold, please |
Alan2 | (getitng lots of lag problems) |
Alan2 | I need to vanish very shortly too 8( |
sh0nX | Alan: do you visit #kernelnewbies? :) |
runlevel0 | ok, im going to leave ;) |
tarzeau | and #debian ? |
runlevel0 | proud of meeting you Alan and all the rest |
runlevel0 | ;) |
sklav | Guys im off aswell thanks for the enlightening conversation |
sh0nX | Alan :) |
sklav | Laterzzzzz |
runlevel0 | bye, will get the logs lkater |
davej | folks interested in the prefetching stuff Alan talked about may find the presentation at http://208.15.46.63/events/gdc2002.htm interesting |
Alan2 | oops |
E0x | <Alan2> ridiculm: right now I am better firmly on the hammer < ---- why ? |
mulix | alan, thanks for the talk... it was very interesting. |
mulix | and it was cool to see you on IRC :-) |
Alan2 | eox: its aimed at the mass market, its designed to run x86 software |
* sh0nX is very pleased with AMD processors (my first one) |
sh0nX | I especially like the MPX chipset |
bev | Alan before you leave... do you have any embedded linux devices doing crazy things in your house? |
sh0nX | hopefully, i can help get some other things implimented from it |
sarnold | (it appears we lost alan...) |
* NiX has been using an AMD Thunderbird for about three years without any stability problems |
Heimy | <riel> supongo que es hora de cerrar la "parte oficial" de esta charla |
sh0nX | Alan, it was a pleasure seeing you speak today |
Heimy | argh |
Heimy | :) |
rp | Alan is gone??????? |
sh0nX | thank you :) |
faiku | when will the logs be online at umeet site? |
apuigsech | Alan, what do you think about processors in future? the evolution.... |
jacobo | faiku: later today or tomorrow morning |
jacobo | %-) |
faiku | ok.thanks |
sh0nX | he vanished ;) |
angelLuis | plas plas plas plas plas plas plas plas plas |
war | /on ^window "*" { echo $strftime($time() (%I:%M:%S %P)) $1- } |