--- Log opened Sat Dec 14 19:02:07 2002
tarzeauwhy did alan remove the "printer is on fire" joke from the kernel?
zwaneit might be worth people also looking at stuff like http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf
docelichow do I ignore join/part messages for the channel in xchat ?
rieldocelic: set the channel to conference mode
rielclick on the arrow to the right of the line where you talk
obiwanis it ok to ask a question at any time in #qc?
rieland click the "Conf" button
viZardobiwan, yes, is ok
tarzeauhow is it in irssi? /set ?
sh0nXconf?
sh0nX:)
sh0nXi dont think xchat 1.8.10 has conf
sh0nX;/
vegai/help ignore on irssi
rielsh0nX: it does
renetarzeau: /ignore #linux JOINS PARTS QUITS NICKS
docelicexcellent, thanks riel
sh0nXriel: hmm
rielsh0nX: click on the arrow button to the right of the line where you talk
tarzeaurene: thanks
sh0nXhh there
sh0nXfound it
sh0nXAlan_Q: How do we compare Athlon XP's to MP's with optimization? Since they are very close to being identical processors?
renethat 1 minute is grossly optimistic in my case
SnakeFoocalo
SnakeFoocspeak spanish ?
lukepsorry, but the speed of light might be the fastest possible
HeimySnakeFooc: if you want to read the lecture in Spanish, please, join #redes
rielSnakeFooc: the spanish translation is in #redes (I think)
HeimySnakeFooc: This channel is only for questions
choofiriel right
mulithe
macTijnhey riel :)
SnakeFoocHeimy oks ..
psyAlan_Q, how can we compare Athlon's processors with Pentium's?
muli          speed of light really is too slow nowdays --- snarfed by the sigmonster
d33pcan the cache be directly manipulated by a programmer.. I would have thought it wasnt?
runlevel0hy
zwaned33p: yes, we can prefetch, forced invalidate etc
runlevel0Will there be logs? Where? THX
zwaned33p: as well as arrange data/code to optimally use it
fernand0yes runlevel
d33pAlan_Q: okay
obiwanAlan_Q : When we compile a program to be optimized for a specific processor, it won't work on lower-end processors (if I'm not mistaken). So if we prepare a binary to be used on many computers (say when making a distro CD), is it always necessary to take the lowest common denominater (i386 perhaps)?
Jimzybut since aching technology undergoes changes, is it really smart to program based on how the cache works? since it could change?
muliobiwan, yes... here the problem is really asm instruction families
rielJimzy: it will always remain "too small", some things never change
SnakeFoocAlan_Q: some recommendation for the novices of linux?
muliif the old processor does not support some of the instructions in the binary, because you compiled for a newer cpu, you're out of luck
n0b0dYThe log will be placed somewhere for people who lost the talk?
muliso you need to compile for the lowest common denominator - i386 in the x86 case.
mantyanybody knows how to ignore joins and parts on ircII, epic or bitchx?
obiwanbut if we want to run processor-intensive stuff, it would be wise to squeeze every ounch of performance from the latest and greatest processor right? that is the point of optimaztion, if I'm not mistaken
rielSnakeFooc: not related to the topic of the lecture, please ask afterwards
tarzeauhow are these blocks for sparc or powerpc comparing to x86?
rieln0b0dY: http://umeet.uninet.edu/
warmanty: egrep -vi '(join|part) log :P
paranoiddobiwan: I saw some research on interpreting invalid instructions on lower-end processors (of course it's only based in a same family of processors, such as x86, though)
init64constructors may implement a way to make the OS able to change the cache lines policy
mantywar: :-)
n0b0dYriel. thanks
rphas anyone got this talk logged?
SnakeFoocriel ok
rielrp: yes, the log will be published on umeet.uninet.edu later
sh0nXrp: yeah as long as ShawnX stays online ;-)
sh0nXheh
SpyderManAlan_Q: are the caching algorithms efficient or could they be improved if, say, the kernel could influence cache content?
rpthanks
safoQuizas el modo de incrementar la velocidad sea en cambiar el sistema de traduccion que tenemos en la gestion de memoria segmentada y paginada?
SnakeFoocriel wran whent
init64Alan_Q : some cache have another way to find lines. They use 1 "comparator" per cache line
init64but I guess it's too expensive for big amounts of cache
SnakeFoocriel warn whent
runlevel0sorry, any URL where the logs will be, those channel msgs annoys me a lot
tarzeaurunlevel0: see topic
sarnoldrunlevel0: /topic  :)
runlevel0ok THX ;)
cybtroI am really sorry... can you repeat the site lo read the logs please? thanks a lot
Heimycybtro: see the topic..
sarnoldcybtro: /topic  :)
cybtro:)
cybtrojeje yes you're right thanks
* sh0nX looks at demo1
tarzeauhow do you run demo1 in single user mode?
tarzeauand all other processes killed?
tarzeauwhat about dos? that runs not in protected mode, but linux is in pm
tarzeaudoes it matter?
rpi am running demo1 with ./a.out and cannot see anything printed
d33prp: same here
ifvoidhow large are the P4 L1 and L2 cache?
rpd33p wait i got it
Alan_Qrp: you may need to wait a while or make the loop smaller for slower processors
tarzeauifvoid: i wonder how's it for sparc/ultrasparc and powerpc
runlevel0[ Alan ] So what demo1 does is much like a large number of perfectly normal applications... buf, I would better not run it right now, compiling the X ;)
tarzeauifvoid: g3 and g4 that is
rpStep of 1 across 4K took 80 seconds.
rpthis is what I got
sh0nXAlan_Q: What is the best/average number of hits in (%) the kernel can get with the processor caches?
davejifvoid: http://www.codemonkey.org.uk/x86info/results/Intel/pentium4-northwood-HT.txt
Alan_Qrp: make it run 1/0th of the number of times 8)
Alan_Q1/0 -> 1/10
rpAlan_Q How???
sarnoldrp: remove one of the zeros from the for loop
d33pAlan_Q: which loop do we make smaller?
sarnoldd33p: (the inner for loop)
rpyes I have removed a zero and recompiled
rpStep of 1 across 4K took 8 seconds.
fetsStep of 32 across 128K took 4 seconds.
fetsStep of 64 across 256K took 10 seconds.
fetsStep of 128 across 512K took 36 seconds.
fetsthis is a xeon ;-)
rpI have AMD k6-II
sh0nXStep of 1 across 4K took 29 seconds.
sh0nX(in X with KDE) so its gonna take longer
Aradorsh0nX: what machine?
sh0nXAthlon MP 2000+
docelichere, all up to 64k takes 5 secs, then 128: 7 sec, 256: 19 sec
sh0nXUP right now
sh0nX2.5.51
sh0nXStep of 2 across 8K took 26 seconds.
sh0nXStep of 4 across 16K took 25 seconds.
smesjzStep of 1 across 4K took 50 seconds. (p133/2.5.49) :)
sh0nXStep of 8 across 32K took 28 seconds.
sh0nXStep of 16 across 64K took 26 seconds.
sh0nXStep of 32 across 128K took 32 seconds.
sh0nXim sure it would be lower in single user mode
coredsh0nX do you test the demo2.c program?
tarzeaucan someoen run these tests on powerpc/sparc/ultrasparc?
sh0nXrunning
rpReference run one took 29 seconds.
sh0nXReference run one took 4 seconds.
sh0nX64K table size took 5 seconds.
sh0nX128K table size took 7 seconds.
paranoiddis it possible to use the Linux SMP implementation of Intel multiprocessor as base for an implementation of asynchronous MP? (e.g. using the sound card's DSP)
sh0nX256K table size took 15 seconds.
coredshit
coredmy pentium mmx is slow :(
sh0nX512K table size took 19 seconds.
coredi have to change the loop to 100 i think
sh0nXif I run this on my Pentium 233MMX
sh0nXit'll be much slower
smesjzAlan_Q: it surprises me that 5 out of demo1 tests returns 50 seconds runtime (100k iterations)
sh0nX1024K table size took 22 seconds.
sh0nX2048K table size took 22 seconds.
sh0nX4096K table size took 21 seconds.
zwaneparanoidd: nope
sh0nXodd
rpcan someone please explain exact meaning of 1024 tablesize?
Heimytarzeau: maybe we could use vore for the sparc test. It's doing nothing right now }:)
rpi mean tablesize
sh0nX4096 too LESS then 2048?!
sh0nX:))
sh0nXtook
* psy is gone.. autoaway after 15 min [obv/lp]
tarzeauHeimy: vore! heimy! that's debian :)
Heimytarzeau: yep O:)
:: Join: reynaert (me134.184.49.28) to #qc
SnakeFoocAlan_Q:   Help   SYSTRAN - Internet translation technologies
SnakeFoocalive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds
rpshall I assume that the lookup tables are in main memory
SnakeFoocoop
|Seifernas Heimy
SnakeFoocAlan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds
Heimy?
SnakeFoocsorry
HeimySnakeFooc: erm...
Alan_Qsnake: they arent cheap here either
ifvoidtarzeau: this is demo1 on alpha:
ifvoidStep of 1 across 4K took 4 seconds.
ifvoidStep of 2 across 8K took 4 seconds.
ifvoidStep of 4 across 16K took 4 seconds.
ifvoidStep of 8 across 32K took 1 seconds.
ifvoidStep of 16 across 64K took 2 seconds.
ifvoidStep of 32 across 128K took 8 seconds.
Alan_Qits not my PIV 8)
ifvoidStep of 64 across 256K took 11 seconds.
ifvoidStep of 128 across 512K took 10 seconds.
sh0nXifvoid: in single user?
ifvoidsh0nX: no
sh0nXin X?
psyifvoid, whats your processor?
BorZungStep of 16 across 64K took 19 seconds.
BorZungStep of 32 across 128K took 151 seconds.
ifvoidpsy: ev6 I think
SnakeFoocit was to me my keyboard
ifvoidbut it's a 4-proc machine, with a load of about 2 atm
psymine took 27 secs at all steps
ifvoid(I removed one 0 btw)
sh0nXAlan_Q: that explains why higher buffers took the same amount of time on my machine.
ridiculumis there in linux any tool like Vtune (intel) to debug thinks like alan is explain?
psyifvoid, how?
ifvoidpsy: ?
psyyou removed one 0
ifvoidyeah
ifvoidfrom the inner loop
psyhow did u?
psyah
sh0nXoh :)
psyok
ridiculummaybe explaning ? (my english it's not good)
ifvoidso, multiply the run times by 10
psyi see heh
SnakeFoocridiculum . ke pasa
L-i-n-u-XKarina: :****
SnakeFoocAlan_Q: alive in Argentinean asi that is average difficult to buy a pentium to me IV this the price to clouds
SnakeFoocL-i-n-u-X !!!!11
runlevel0SnakeFooc: behave plz
SnakeFoocoks
ifvoidhmm, fiddling with the compiler options makes a lot of difference for demo1
sh0nXAlan_Q: I always thought having data aligned on 512/1024, etc was a good thing, since those are common alignments and should be easier for the processer to split things into?
ifvoidI wonder why
sarnoldSnakeFooc: alan responded .. he said they are expensive for him too
Alan_Qifvoid: always use -O2 or -O3 for those tests, you want to measure the cpu not the compiler 8)
SnakeFoocokssss
rpi stilll don't get what *eactly* are lookup tables...
sapanAlan_Q: is demo1.c also a statement in favour of implementing limited no.s of svcs, like httpd in the kernel? since then send loops etc. would be much tighter?
rpcan someone point me to an URL explaining those
avoozlvalgrind also might be interesting to look at, the latest version also can do cache simulation and show where in a program cache misses are occuring
rpplease?
psywell, i gotta go
tarzeauwere there cpu's without cache? intels 80286?
Heimyrp: a table with precalculated data
d33pwhat about grof that comes with the GNU binutils?
psygonna check the log later, see you
d33ps/grof/gprof
sarnoldtarzeau: 486s were frequently sold without cache to make them cheaper :)
rpaLAN SAYS: so you can avoid lookup tables when doing things like colour conversion
rpHeimy: what does this mean then?
Heimyrp: erm...
docelicsarnold: hehe yea, like the "Packard Bell" 486s :))
Heimyrp: Alan told about that just a few minutes ago
tarzeausarnold: aha :) same with the cheap p2 with less cache
tarzeauand harddisks...
tarzeaui've have one 512mb one without cache (horribly slow)
tarzeaui wonder what's that altivec stuff on powerpc's
rene386 also had no onboard cache.
SnakeFoocAlan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much
zwanetarzeau: you have a cache
Heimyrp: Sometimes, if you want to speed up some expensive repetitive calculations, you can make a table of precalculated data
tarzeaurene: heck some didn't have a fpu inside
Heimyrp: And just look at that table for data
rpHeimy: ok
rpHeimy: ok...getting it
tarzeaurene: didn't they sell separate weitek c(?)pu's?
tarzeauzwane: i have a cache?
runlevel0sarnold: I have an old Pentium board with this external 256 k caches
renetarzeau: none did. the fpu got integrated in the 486(dx)
sh0nXmath co-processors
Heimyrp: But looking at that table is sometimes even more expensive than the calculation itself
zwanetarzeau: you're thinking of no L2
sh0nXthe SX is without co-pro
Heimyrp: except if it fits on caché
tarzeaubut why does cpu access to 16bytes of l1/l2 cache? the registers are 32bit since 386.. and still they are only 32bit on intel
tarzeaueax...
rpHeimy: and usually the tables are in main memory, right?
aka_mc2QUESTION:  ¿¿but what is the alternative ot actual slow cache?? ;)
Heimyrp: exactly. If they're on cache memory, then it will be very fast to look up that tables
rpHeimy: oh! got it now...thanks
SnakeFoocAlan_Q: new kernels is going to bring support for machinery old so that the new ones are very great and heavy or make difficult the compilation much
sh0nXPlease don't repeat SnakeFooc
SnakeFoocsh0nX oks :D
renetarzeau: on Pentium (I, MMX, Pro, II, III) the cacheline-line is 32 bytes, not 16. on p4 and athlon it's 64 bytes
IkarusPPRO actually came in 2 MB aswell
SnakeFoocsorry
renecache*line*
tarzeauthose edo memory sticks were 60ns and 70ns, how ns are l1 and l2 caches?
tarzeau(72pin thingies)
Ikarustarzeau: iirc, about 12 ns on Pentiums for L2
sh0nXi have 60ns EDO on this P233MMX
Ikarus(or atleast on the ones I pulled apart)
sh0nXit DOES make a difference
tarzeaush0nX: i do in my sparc classic :)
sh0nXheh
|Seifer¿ Es posible que el GCC 3.2 tenga problemas al compilar ciertas cosas ?
SnakeFoocAlan_Q my cuestion ?
aka_mc2DDR PC2100 can be 2.0 CAS latency
ifvoid|Seifer: could you please repeat the question in english?
sh0nXifvoid: someone can translate it :)
SnakeFooc|Seifer io me encargo
debUgo-tarzeau: L1 and L2 caches runs at full CPU speed in actual processors, so, do the math ;)
IkarusdebUgo-: not true
|SeiferSnakeFooc, ok
Heimy<|Seifer:> It's possible that GCC 3.2 have problems compiling some things
IkarusdebUgo-: on the Pentium they ran at bus speed
Heimy?
Heimyerm... That was a question
Heimy<|Seifer:> It's possible that GCC 3.2 have problems compiling some things?
HeimyBetter :)
IkarusdebUgo-: and on the Pentium II they ran at 1/2'th CPU speed
debUgo-_ACTUAL_ processors, Pentium I is not an actual processor
SnakeFoocAlan_Q: It is possible that GCC 3,2 has problems when compiling certain things?
|SeiferHeimy, thx
Heimy|Seifer: ;)
sarnoldSnakeFooc: no compiler is perfect
sh0nXRun 1 (silly way) took 16 seconds.
tarzeaudebUgo-: so we loose alot of time between cpu and memory and memory and harddisk! :( umf
sh0nXRun 2 (smart way) took 10 seconds.
sh0nXRun 3 (live data only) took 5 seconds.
ifvoidRun 1 (silly way) took 5 seconds.
ifvoidRun 2 (smart way) took 2 seconds.
ifvoidRun 3 (live data only) took 0 seconds.
sh0nXinteresting
SnakeFoocsarnold for ?
tarzeaui lose most time turning on/off my computer
Ikarusifvoid: it optimised the third run into oblivion ?
sh0nXifvoid: what processor do you have?
sh0nXi used no optimization
debUgo-tarzeau: harddisk speed really sux =P
mcprootcodeman:[/tmp] # ./demo4
mcpRun 1 (silly way) took 23 seconds.
mcpRun 2 (smart way) took 13 seconds.
mcpRun 3 (live data only) took 6 seconds.
sh0nXRun 1 (silly way) took 5 seconds.
sh0nXRun 2 (smart way) took 4 seconds.
sh0nXRun 3 (live data only) took 1 seconds.
tarzeaudebUgo-: ever seen a tapedrive?
sh0nX^^ With optimization
ifvoidIkarus: cc -fast, nothing fancy
ifvoidsh0nX: alpha EV6
sh0nX-O3 here
Ikarusifvoid: which cc ?
d33p48,29 and 15 on a p3 1ghz running at 700mhz
|Seiferporque tengo problemas al compilar kernel 2.4.3 y 2.4.17 con GCC 3.2
sh0nXRun 1 (silly way) took 4 seconds.
sh0nXRun 2 (smart way) took 4 seconds.
sh0nXRun 3 (live data only) took 1 seconds.
sh0nXgcc demo4.c -O5 -o a
rpRun 1 (silly way) took 62 seconds.
sh0nXgcc version 3.2.2 20021123 (prerelease)
SnakeFoocbecause I have problems when compiling kernel 2,4,3 and 2,4,17 with GCC 3.2
ifvoidIkarus: Compaq C V6.4-014 on Compaq Tru64 UNIX V5.1A (Rev. 1885), Compiler Driver V6.4-215 (sys) cc Driver
runlevel0sh0nX -O5 ??? is that much necessary ?
tarzeaudoes rms and linus also give talks on irc?
sh0nXi could go higher
sh0nX:)
jmgvplese let's fix talk issues!
debUgo-Alan_Q: how much affects cache associativeness in general memory performance?
ridiculumSnakeFooc not all kernels compile with gcc 3.2. i think the oficial compiler is still 2.95
sh0nX-O99
jmgvofftopis at #latertulia thank you
rpGIMP wowowwow
SnakeFoocoks
mcpwoohoo, -O5 gives a amazing performance boost
sh0nXRun 1 (silly way) took 4 seconds.
sh0nXRun 2 (smart way) took 4 seconds.
sh0nXRun 3 (live data only) took 1 seconds.
sarnold|Seifer: please save that for some other forum; alan is talking about processor optimizations, not fixing compiling problems ;)
sh0nXthat is the maximum preformance i can get
rpRun 2 (smart way) took 40 seconds.
rpRun 3 (live data only) took 18 seconds.
ridiculumSnakeFooc oficial for kernel compile. for other things you can use gcc 3.2
rpand I have 512Kb cache and still these results
rielrp: so your CPU spent 22 seconds playing with memory
d33pheh right ojn, compiling with optimisations is seriously skewing those results and the relative difference factor  also decreased
rielrp: and only 18 seconds on the actual calculation
rpriel: 22  sec??? how
mcpRun 1 (silly way) took 7 seconds.
mcpRun 2 (smart way) took 5 seconds.
mcpRun 3 (live data only) took 2 seconds.
onkiRun 1 (silly way) took 16 seconds.
onkiRun 2 (smart way) took 13 seconds.
onkiRun 3 (live data only) took 2 seconds.
rielrp: yes, it spends more time waiting for memory than doing something useful
rielonki: in your case it spent 7 times as much time waiting for memory as it spent doing real work
onkiyeah, I see
rpwas it because I have lot of cache (512)
onkiI only have 128
fetsI have quad xeons, so i'm interested :P
SnakeFoocas it is the cause of the problem?
yaluI have one of those athlons witrh 512 Kbyte cache... but slow cache
yaluathlon classic
fetsand I'll have some NUMA ibm x440's to play around with next month
ifvoidAlan_Q: won't that change for the Hammer and Itanium 3?
rpso does that mean memory performance does not depend only on processor but also on bus speed?
runlevel0<Alan>So a dual processor machine gives us twice the problem  :so this explains why we do not get 2x the performance of 1 porcessor
bzzzAlan_Q: could you describe coherent related problem x86 has to do with cache?
debUgo-AFAIK, dual Athlon have dual bus (a bus for each CPU)
ridiculumwhat about hyperthreading and cache coherence?
sh0nXso this is where spinlocks come in
SnakeFoocAlan_Q: GCC 2,96 of Network Hat 7,1 in RH 8 can be put ?
sh0nXahh ok :)
sh0nXthanks Alan :)
ifvoidsh0nX: what's a spinlock?
E0xsi es asi cual seria la ventaja de al final de los servers duales ?
docelicId appreciate more on spinlocks too
sarnoldifvoid: a processor just spins waiting for a resource to become available in a "busywait" loop
SnakeFoocE0x en ingles jeje
SnakeFooc:D
bzzzAlan_Q: how pci devices may see data which in cache only?
sh0nX:)
sh0nXifvoid: what sarnold said
sh0nXheh
AradorE0x: what's the advantage of dual servers then?
sh0nXspinlocks let the kernel use SMP
rpAlan says :  One thing the processors have to do is.......
rpisn't it the job of OS?
sarnoldrp: on some systems, yes
ridiculumcache coherece it's a hardware problem
rpoh! ok
Aradorwhat about preempt, can preempt do more cache misess?
tarzeaui noticed linux on my sparc classic (50mhz) is horribly slow compared to netbsd (1.6.x)
rene"we" have to kick? we as in the OS, or is that hardware-automatic (on x86)
tarzeaui think it was a late 2.2.x kernel (of debian), which i compiled myself for the qe, and it took 5 hours
ridiculumtarzeau that's true? i have a debian on a sparc classic
sarnoldrene: on most machines, we the processor
renesarnold: yeah... that's what I thought
ridiculumtarzeau ssh it is slow but you can compile with fpu enabled and have more speed
tarzeauridiculum: yeah i noticed it swaps much less with netbsd and generally responds better (i've got x+amiwm and mpg123 on it)
tarzeauridiculum: www.linuks.mine.nu/screenshots/netbsd.png
ridiculumtarzeau argg. i haven't X. it's only a mini-server
tarzeauridiculum: you mean with -g8m or something like that? is it really faster? i've used telnet on it for remote x
Aradortarzeau: 404
tarzeauoh wait amiwm.png
sh0nXAlan_Q: so we should be using the cache for SMP processors to keep data that isnt going to change much and use the processors to handle data that does change often?
fetsalan: (ibm) this is in the -summit kernels ?
rpsh0nX: SMP?
sh0nXrp: multiprocessor
sarnoldrp: Symmettric MultiProcessor
rpoh!
sh0nXAlan_Q: thats terrible :/
ridiculumtarzeau debian compile all distro with math emulated. it not use fpu. that's horrible
Ikarusridiculum: which is ofcourse not true
tarzeauridiculum: at least they have super cow powers!
Ikarus(or atleast as far as I have been able to see)
ridiculumIkarus it's true
ifvoidridiculum: it's not
ridiculumIkarus debian woody
tarzeauridiculum: and a large user base to get help from (unlike rh/suse), www.linuks.mine.nu/debian-worldmap (and lets cut it here before we have a distro war)
ifvoidridiculum: it's just not optimized for specific processors
ridiculumIkarus ssh is very,very slow
ifvoidridiculum: so no MMX, SSL, 3dNOW
sarnoldridiculum: ssh uses integer math, not floating point
tarzeauridiculum: only on really slow computers like 486 and other systems under 100mhz
ridiculumifvoid sparc haven't MMX, SSE ;)
acmeAlan_Q:  does any of the standard libjpeg, libtiff, libpng, etc take advantage of SMP in the fashion you described?
tarzeauridiculum: yyes there's sparc v7,8,9
Ikarusridiculum: hold on, sparc is different, it doesn't have FPU emu in the kernel
ridiculumtarzeau sparc system. i talk about sparc sysmtem's
sh0nXAlan_Q: so we want to keep both processors doing OTHER things
tarzeauridiculum: and if they'd optimize for one it wouldn't run on the older ones!
ridiculumtarzeau debian compile for V7
sh0nXand not tasks that involve the same sort of data
tarzeauridiculum: and people have v7!
sh0nXahh ok :)
tarzeauridiculum: check #sparc on opn/fn
Ikarusridiculum: so to get it to run on ALL sparcs it is all compiled without FPU
sh0nXthat makes sense.
ridiculumtarzeau argg. V7 it's very very old. maybe 10 years old? more?
tarzeauridiculum: it's another thing why linux is slow on sparc classic's not that optimization
tarzeauridiculum: the 2.2.x tree wasn't updated and there's something really badly slow because of something (i just don't remember what)
tarzeauridiculum: my classiccpu has 1991 on it, that's 10 years too
ridiculumtarzeau i compile a 2.4.20 and it's ok
tarzeauridiculum: on you sparc classic you've got 2.4.20 ?
ridiculumtarzeau yes. on a sparc classic and on a SS5
reneI do not wish to be a bore, and I'm certainly not a moderator, but could we keep this channel for questions to Alan?
tarzeauapropos .. what about that mmu-less stuff? can i run linux on my amiga 1200 (standard) one soon?
sh0nXAlan_Q: so when designing SMP applications, how do we tell which processor to handle which data without causing the processors to both handle the same data?
sh0nXin the kernel we use spinlocks
sh0nXbut in userland i dont know how that works
yaluAlan_Q: is the scheduler smart enough to keep threads who share a lot of data on the same processor?
ridiculumsh0nX semaphores
sh0nXridiculum: so threads basically
ridiculumsh0nX or process. you can have 2 process (or more) with shared memory
zwaneAlan_Q: All this must get really interesting with Hyperthreaded cpus
bvcdoes it sound right that i am unable to use protection map from a module?
sh0nXwe create threads in userland, and then the kernel handles this with its scheduling
sh0nXI see now how it fits together
Alan_Qsh0nx: yep
sh0nXah :)
zwaneAlan_Q: do you reckon scheduler only would suffice? How about leveraging cpu affinity for say doing bias in interrupt handling?
jacoboAlan_Q: could you please wait a couple of seconds after pasting text, to make the translators a bit happier? ;-)
Alan_Qok
sh0nXso, if a program is written for UP, how does the kernel scheduler handle its data on two CPUs? or it can't
zwaneAlan_Q: so you wouldn't go the RR ioapic interrupt distribution way as in 2.5? Or has this changed
mulish0nX, what does "written for UP" mean? single process?
sh0nXyes
sh0nXuniprocessor
sarnoldAlan_Q: does linux currently have a mechanism to specify that all interrupts should be handled by a specific [set of] CPUs?
sh0nXI see, so we have to use threads in our code in order to benifit SMP
sh0nXbenefit even
mulish0nX, threads or multiple processes
muliif you have just one thread of execution, nothing can split it up to multiple cpus for you
sarnoldsh0nX: or just run enough applications on the machine....
sh0nXmuli: forked apps?
sh0nXI see
sarnoldAlan_Q: is it worth prefetching the next pointer one is going to follow?
rielthat list structure will make for an interesting list_add() ;)
sh0nXhehehe
Alan_Qsarnold : tell me when they recover
runlevel0XD
sh0nXwe had multiple translators it would speed things up *grin*
sh0nXbut the problem would be duplciate data being translated
sh0nX;)
yaluthey share too much data I'm afraid :)
sh0nXheh
jacobosh0nX: we do, but we're translating serially
Alan_Qsh0nx: you have to share the data carefully
sh0nXright.
yaluyou need an extra scheduler then :)
jacoboI was translating before, now Heimy is
jacobothe pace was good until sh0nX started to ask ;-)
sh0nXsorry ;-)
sh0nXSMP is interesting to me
sh0nXnow all i need is one more processor have SMP :)
muliAlan_Q is gcc any good at optimizing for a given cpu's predicted cache usage?
sh0nX(soon)
Alan_Qmuli: its hard for the compiler to do that
sh0nXto have SMP even.
Alan_Qbut gcc doesnt always do a good job 8(
grifferzAlan_Q: how applicable are things like prefecth to userland programming without knowing about the hardware?  e.g. is it possible that doing some prefecth that speeds up something for a 2 CPU x86 system will actually cause worse performance on a 4 CPU sparc?
rielmuli: often it cannot do much
mulione more question, is there a "lowest common denominator" cache behaviour, or is it possible that optimizing for one cpu will be a pessimization on another?
rielmuli: because you tell the computer what to do
sarnoldAlan_Q: ok, i think they are about caught up :) thanks
Alan_Qhold them for a minnute.. lets do the last 10 lines of the talk firs t8)
sh0nXheh
rielmuli: for example, a C compiler usually isn't allowed to change your data structures for you
grifferzI am impressed that I managed to typo "prefetch" the same way twice
muliriel, I was thinking of teaching gcc to do stuff like prefetch by itself
viZardAlan_Q, you can continue
viZardwe´ll catch you up ;-)
MJesustraslator are ready, thanks
davejmuli: There's __builtin_prefetch
muliriel, I think the compiler is not allowed to reorder your members, but it's free to pad as it sees fit.
mulidid anyone measure / think about the effects of kernel preemption on cache usage?
renewouldn't turning lists into "lists of little arrays" (dynamically adding a block when required) help with the prefetching for lists?
sh0nXI assume we use some sort of spinlock to prevent another processor from prefetching the same data?
Alan_Quggh lagging
* sh0nX is beginning to understand how this all works slowly
sh0nXnow if i can figure out the kernel API ;)
MJesusAlan this is the last in #linux:
MJesus[20:36] <Alan> sh0nX] I assume we use some sort of spinlock to prevent another
MJesus[20:36] <Alan>          processor from prefetching the same data?
sh0nXI see
Ikarusrecursivity, fun
sh0nXthats bad
sarnoldsh0nX: a good resource for more details is a book by curt schimmel, Unix Systems for Modern Architectures: symmetric multiprocessing and caching for kernel programmers
sh0nX:)
sh0nXsarnold :)
rielsarnold: a very nice book, indeed
sh0nXsarnold: once i get to understand how 2.5/2.6 works i then I can dive in more
sarnoldAlan2: i've wondered if prefetching cuts memory bandwidth significantly.. have people tested with prefetch config'ed away?
aka_mc2ALAN: do you know Crusoe processor? it has how amount of cache?
tarzeauwas this talk announced somewhere? i just heard about it last minute in #debian on opn/fn
Ikarustarzeau: all talk info is on the umeet website
jacobotarzeau: yes
reneAlan2: talk seemd to be about cacheing alone. do things like instruction alignment make a lot of difference om modern processors?
rieltarzeau: it was also on the LWN calendar
rieland a bunch of other places
tarzeauriel: oh well if one doesn't read lwn :)
AradorAlan2: what're the effects of preempt on caching?
docelicso umeet is the place to look for similar events? some other places maybe ?
bitlandimplementation of UMA (United Memory Access, in some Silicon Graphics) in mainboard chipsets could be a better solution for most problems like these? (excuse my bad english) :)
sh0nXI always thought we wanted to flag prefetched data as being prefetched already, im surprised another processor will fetch it again.
sh0nXeven if it doesnt seem like much of a preformance penality doing so
sklavHi guys
tarzeaudocelic: i can't wait to see rms or linux talk on irc
sklavi was wondering what effect using optimization -03 or -05 have on the kernel?
tarzeaualan what irc client did you use?
bevBlackend
sh0nXsince Alan mentioned we dont want to work on the same data on both processors prefetching the (same) data seems to contradict this?
aka_mc2ALAN: do you think that Crusoe processor, Linux supported, it will be considered for all these programmation techs??
Aradorsklav: AFAIK, -ON where N>3 means -O3....(don't know if it happens nowadays)
sklavArador: by default the kernel itself uses -02
sklavbut it can be changed in the Makefile to -03 and so on
sklavBut im nt sure if this causes other problems
sklavLike a performance hit
rieldocelic: maybe #kernelnewbies could organise some isolated lectures throughout the year
sklavi have noticed higher load averages after i use a kernel with -03 and or -05
rieldocelic: but as far as I'm concerned, UMEET is the place to cluster a bunch of lectures ;)
mulixriel, that would be a great idea
mulixlike a biweekly or monthly lecture from the kernel guru of your choice :-)
sh0nXriel: yes
tarzeaumulix: yeah i'd like that too
sh0nXriel: I'd especially like to learn how to use PnP on 2.5 ;-)
sh0nXhehe
tarzeauAlan2: thanks alan, it was a nice talk you gave
sarnolddocelic: note that uninet hosts other lectures; such as security, ipv6, some for medical doctors, etc.. all sorts. :)
Aradorsklav: -O3 means that gcc adds some extra inlining that weren't requested, i think
sklavok
docelicsarnold: even better
jmgvAlan? dont you think a lot of the work about registers users and other questions depend of the compiler and that made us lose some control about those issues?  
sklavThanks Arador and Alan
aka_mc2ALAN: there is in a future an alternative for the cache memory? (another system of fast data access can be...)
E0xthe tecnology of HT represent a avance about this problem .... ?
mattamAlan2: prediction's better than having a loop unrolled ?
sh0nXaka_mc2: parrallel data cache? ;)
sh0nXparallel
aka_mc2ok, gracias shnox
IkarusAlan: do you think a L3 cache shared between SMP processors would give a significant performance benefit
sh0nXhow how about multiprocessed cache?
sh0nXaka_mc2: im just guessing
sh0nXa cache that is smart enough to decide whats needed for the processors
jmgvi see. but....
aka_mc2shn0x: xD
jmgvi've compiler the same program using gcc and intel compiler, and intel produce a code 30% faster,
Ikarusjmgv: it averages out to alot lower then that
sh0nXI'd like to see a mini 'processor cache'
sklavjmgv: does the intel compiler have the same benefits on an AMD?
runlevel0jmgv: can I use the intel compiler on an athlon ?
Ikarussomething like 8-10 %
jmgvsklav: i know
sh0nXone that actually can make decisions with the help of the cpu
jmgvi dont doubt abot gcc quality
sapanAlan2: you said "we know that only memory of certain sizes at certain offsets can be cached" could you explain?
sklavJust curious jmgv Im not a compiling guru by any strech of the imagination
sklav;)
sklavAlthought i can impress my friends! lol
jmgvsklav: :)
sapani c
sklavWell to be honest i fnd it very cool that the linux community as whole even does irc sessions like this
sh0nX;)
sh0nXheh
* sklav Smokes a cigarette as he reads the messages
sh0nXyou can always expect a huge turnout for Alan's lectures :-)
Heimy;)
jacoboyeah
rpI cannt find where this talk is logged at the site
sklavIm surprised he has found the time to give 1
jacoboI wouldn't expect that for one of mine ;)
error27Alan2: I have heard talk about creating ram with inteligence built in so you can do simple operations to it directly instead of using the cpu.   What do you think about that idea?
rpThanks Alan
Heimyrp: this is not logged right now
sarnoldrp: it takes a little while to clean and prettify the logs.. they will be up soon. :)
sh0nXhehe
rpHeimy: oh!
sh0nXsince we're offtopic now: Alan2: Do you have patch for the amd76x_pm module for 2.5.xx?
sarnoldrp: (look in the "congress details" piece...)
jacoborp: you'll be able to find it later in "Congress Details"
docelicit would be nice to separate it in a document without community questions (like, to form an article of what Alan said), and then put Q/A section separately
Heimyrp: we have to strip some things, and beauty it a bit
Heimyrp: :)
rpoh! great waiting for that
sarnolddocelic: it is
BorZungthe slices docelic ..
rpI missed many things...am very new to irc and Linux
tarzeauoh my god i've missed half of the talks
runlevel0coywofl: Win is *faster* if you throw it from a 9 stores high building XD
sapanAlan2: I have an iPAQ with familiar running 2.4.18-rmk - if I were to optimize things in the kernel/apps in general, what should I be looking at?
jacoborp: you'll find previous ones already there, in case you want to have a look :)
rpjacobo: I am just looking
sh0nXAlan2: im trying to port it right now
sh0nXbut insmod is oopsing kernel ;-)
docelicBorZung: well .. yea, kind of.
E0xAlan2 what is prefer procesor ?
sh0nXec 13 22:22:05 unknown kernel: amd76x_pm: Version amd76x_pmhardware driver0.1.0
sh0nXDec 13 22:22:05 unknown kernel: amd76x_pm: Initializing southbridge Advanced Micro Dev
sh0nXic AMD-768 [Opus] ACPI
sh0nXDec 13 22:22:05 unknown kernel: Unable to handle kernel NULL pointer dereference at vi
sh0nXrtual address 00000026
sh0nX;/
sh0nXif only i can debug this easier
E0xAlan2 what is  you prefer procesor ?*
sh0nXfor some reason
sh0nXDec 13 23:21:19 unknown kernel: Process insmod (pid: 8180, threadinfo=c8b08000 task=db
sh0nX2e0040)
sh0nXunless modules is still broken in 2.5.51
sh0nX;)
apuigsechEn la tabla GDT nos encontramos muchos decriptotes nulos (no usados), ¿para mejorar el rendimiento en el uso de la cache?
sklavActually i have yet been able to compile 2.5.x
sklavI keep getting errors in make modules
sarnoldno fan needed? ooooh! :)
Ikarussarnold: well, the faster ones do need them
sh0nXAlan2: What is the best way to do debugging with the kernel, since i can't use UML on hardware, would a vmware work? or do i have to just keep doing the reboot thing ;)
Ikarusbut only tiny ones
mssame for geode's but they are *SLOW* :o)
EleTROnAlan2: speak about Crusoe
ridiculum what's your opinion about itanium2? it's better than hammer?
sarnoldEleTROn: he mentioned them earlier..
debUgo-Alan2: did you assign reliability problems in Athlon systems to the CPU itself, the supporting chipsets, or any other stuff?
RaD|TzAlan2: How about the Duron processors???
BorZungsh0nX botchs?
EleTROnok
rielneat
sh0nXBorZung: would bosh emulate all hardware?
warHe Al answering these questions via privmsg or a different channcel?
sh0nXbotches
sapanAlan2: [ignore if OT] I often get a "Machine Check Exception 7" on my Athlon which I can't decode even with dj's mce decoder. Any idea?
wars/He/Is
sarnoldwar: #linux
wark
warthx.
davejsapan: mail me the output and I'll take a look.
sh0nXBorZung: i thought it only emulates the core x86 only not other things so much
renestill, ia64 is a nice architecture
sarnoldrene: as long as your goal _isn't_ compiles.. :)
sapandavej: hey! thanks a ton :)
sh0nXdavej: I dont see those Exception errors on 2.5.xx anymore.
apuigsechAlan, on GDT table we can find some nul decriptors (not used), ¿is that to gain optimization on cache memory usage?
renesarnold: the chip itself, I mean. :-)
BorZungsh0nX i just hear about it
sh0nXMCE errors
davejsh0nX: interesting.
sh0nXMCE: The hardware reports a non fatal, correctable incident occured on CPU 0. Bank 0: 9409c00000000136
sh0nXsomething in later 2.5 must have fixed something
reneapuigsech: in 2.4, the gaps between per-cpu TSS/LDT are for caches
sh0nX2.5.24
sh0nXwas when i saw an MCE
sh0nXbut thats a while back :)
rene(so that CPUS don't trample on ech others cache lines)
apuigsechTSS and LDT is not used on 2.4
Alanick
reneapuigsech: it sure is :-)
angelLuisOP to alan: MJesus
apuigsech:)
runlevel0>Alan> "...or for windows bug compatibility in the bios", for god's sake, the chips are made thinking about Win bugs XD
runlevel0amazing
sh0nXyou know it's a good night when you have Alan speaking and a Hockey game tonight on tv :))
davejrunlevel0: rumour has it the Pentium IV is model 15 due to a bug in Win NT.
sh0nXit don't get better then that :)
runlevel0dajev: X_D
runlevel0my god
angelLuisOP to alan2: sarnold, please
Alan2(getitng lots of lag problems)
Alan2I need to vanish very shortly too 8(
sh0nXAlan: do you visit #kernelnewbies? :)
runlevel0ok, im going to leave ;)
tarzeauand #debian ?
runlevel0proud of meeting you Alan and all the rest
runlevel0;)
sklavGuys im off aswell thanks for the enlightening conversation
sh0nXAlan :)
sklavLaterzzzzz
runlevel0bye, will get the logs lkater
davejfolks interested in the prefetching stuff Alan talked about may find the presentation at http://208.15.46.63/events/gdc2002.htm interesting
Alan2oops
E0x<Alan2> ridiculm: right now I am better firmly on the hammer < ---- why ?
mulixalan, thanks for the talk... it was very interesting.
mulixand it was cool to see you on IRC :-)
Alan2eox: its aimed at the mass market, its designed to run x86 software
* sh0nX is very pleased with AMD processors (my first one)
sh0nXI especially like the MPX chipset
bevAlan before you leave... do you have any embedded linux devices doing crazy things in your house?
sh0nXhopefully, i can help get some other things implimented from it
sarnold(it appears we lost alan...)
* NiX has been using an AMD Thunderbird for about three years without any stability problems
Heimy<riel> supongo que es hora de cerrar la "parte oficial" de esta charla
sh0nXAlan, it was a pleasure seeing you speak today
Heimyargh
Heimy:)
rpAlan is gone???????
sh0nXthank you :)
faikuwhen will the logs be online at umeet site?
apuigsechAlan, what do you think about processors in future? the evolution....
jacobofaiku: later today or tomorrow morning
jacobo%-)
faikuok.thanks
sh0nXhe vanished ;)
angelLuisplas plas plas plas plas plas plas plas plas
war /on ^window "*" { echo $strftime($time() (%I:%M:%S %P)) $1- }

Generated by irclog2html.pl 2.1 by Jeff Waugh - find it at freshmeat.net!