Programa

Desarrollo
Recuerden Central European Time (CET) es Greenwich Meridian Time + 1 (GMT + 1).

TALKING IN #LINUX TALKING IN #QC

[18:54] *** Now talking in #linux
[18:54] *** Topic is '19 horas CET, Rik van Riel, and Marcelo Tosatti, Linux and High Availability (English and Portugues)'
[18:59] (MJesus)www.barrapunto.com is very interesting now ! also, www.linuxpreview.org
[19:06] (marcelo) Well
[19:06] (marcelo) I think we should start now
[19:06] * DaniT__ les informa que para los que tengan problemas con el ingles se intentara realizar una traducción simultanea en el canal #media
[19:07] (Fernand0) we usually wait for five minutes in order to people has time to come
[19:07] (marcelo) ok.
[19:07] (Fernand0) thanks :)
[19:09] (MJesus)ATENCION en #media hay traduccion simultanea a español de esta charla
[19:09] (dre) ATENCION en #media hay traduccion simultanea a español de esta charla <---- en la medida de los posible....

[19:10] (Fernand0) well
[19:11] (Fernand0) Hi,
[19:11] (Fernand0) we are very pleased to present you today two kernel hackers who are working
[19:11] (Fernand0) for Conectiva: Rik van Riel and Marcelo Tosatti.
[19:11] (Fernand0) As all of you know, Conectiva [www.conectiva.com, visit them :) ] is a big
[19:11] (Fernand0) Linux company from South America.
[19:11] (Fernand0) About Mr. van Riel, nothing new ;) (he talked here the other day).
[19:11] (Fernand0) Appart from kernel hacking, he also runs the Linux-MM website and the
[19:11] (Fernand0) #kernelnewbies IRC channel on openprojects.net
[19:11] (Fernand0) You can find more about him at: www.surriel.com.
[19:12] (Fernand0) About Mr. Tosatti: he has been working for Conectiva for the last 4 years
[19:12] (Fernand0) He maintains the Conectiva kernel RPM and
[19:12] (Fernand0) has been working in the areas of memory management, drbd (the
[19:12] (Fernand0) distributed redundant block device) and the ext3 filesystem.
[19:12] (Fernand0) In addition to that, he is active on the
[19:12] (Fernand0) Linux High Availability project and has contributed to the heartbeat
[19:12] (Fernand0) software.
[19:12] (Fernand0) You can reach his homepage at:
[19:12] (Fernand0) http://bazar.conectiva.com.br/~marcelo/
[19:12] (Fernand0) They will talk today here about "Linux and High Availability".
[19:12] (Fernand0) The talk will be in English-Portuguese, but we have some volunteers that
[19:12] (Fernand0) are going to translate it on-line to Spanish (in channel #media).
[19:12] (Fernand0) The talk will be here, in #linux channel, Mr Riel suggested us to make
[19:12] (Fernand0) another channel (#qc -) questions channel) to write questions during the talk.
[19:12] (Fernand0) Should you have any questions, comments, etc, just write them in #qc
[19:12] (Fernand0) and Mr. Riel will reply.
[19:12] (Fernand0) Thank you to Mr. Riel and Mr. Tosatti for comming here and also to all of you.
[19:12] (Fernand0) Our thanks are bigger in this ocasion because today's speakers
[19:13] (Fernand0) improvised a presentation for us when they knew about our 'official
[19:13] (Fernand0) speaker' was missing.
[19:13] (Fernand0) Mr. Riel, Mr. Tosatti ...
[19:13] (marcelo) Hi

[19:13] *** riel changes topic to '19 horas CET, Rik van Riel, and Marcelo Tosatti, Linux and High Availability (English and Portugues) || questions? #qc'
[19:13] (riel) good afternoon *
[19:13] (marcelo) So, first I'll start to explain what is High Availability
[19:13] (riel) today we will be giving a talk about Linux and High Availability (Alta Disponibilidade)
[19:14] (riel) if you have any questions, you can ask them at any time in #qc
[19:14] (riel) this channel is supposed to be silent, except for marcelo and me
[19:14] (riel) marcelo? Could you explain us what High Availability is?
[19:14] (marcelo) Ok, lets go...
[19:14] (marcelo) First I'll explain the concept of High Availability
[19:15] (marcelo) Availability is the probability that a system is running at a given time
[19:16] (marcelo) A system without any mechanism to enhance its availability is considered to be have basic availability
[19:16] (marcelo) s/be//
[19:16] (marcelo) This kind of system, in theory, has 99,9% of availability.
[19:17] (marcelo) So, during 1 year, its likely to have 5 days of downtime.
[19:17] (marcelo) When a system has more than 99,999% of availability its considered to be a high availability system
[19:19] (marcelo) So, for a system to be considered "highly available" it must, in theory, have 5 minutes of downtine during the period of 1 year.
[19:19] (marcelo) The old model of high availability is "fault tolerance" usually hardware-based.
[19:20] (marcelo) Expensive, proprietary.
[19:20] (marcelo) This old model goal is to have the hardware system running
[19:21] (andres) plas
[19:22] (riel) so basically, a single computer is an unreliable piece of shit (relatively speaking) ...
[19:22] (riel) ... and High Availability is the collection of methods to make the job the computer does more reliable
[19:22] (riel) you can do that by better hardware structures
[19:22] (riel) or by better software structures
[19:22] (riel) usually a combination of both
[19:23] (marcelo) the Linux model of high availability is software based.
[19:11] *** Now talking in #qc
[19:12] *** riel changes topic to 'questions & discussion to the High Availabity talk by marcelo & riel'
[19:14] (riel) ok, while marcelo explains in #linux, I am always ready to answer some questions here, if needed
[19:15] (riel) but first, lets listen to marcelo ;)
[19:15] (MJesus)ok
[19:15] (kerniXid) :)
[19:18] (riel) one small correction to marcelo's numbers ... 99.9% is 5 _hours_ of downtime each year (but that's not very important)
[19:19] (MJesus)as winsuck bugenium!

[19:24] (marcelo) Now let me explain some basic concepts of HA
[19:26] (marcelo) First, its very important that we dont rely on unique hardware components in a High Availability system
[19:26] (marcelo) for example, you can have two network cards connected to a network
[19:26] (marcelo) In case one of the cards fail, the system tries to use the other card.
[19:28] (marcelo) A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"
[19:28] (marcelo) SPOF, to make it short. :)
[19:29] (marcelo) Another important concept which must be known before we continue is "failover"
[19:30] (marcelo) Failover is the process which one machine takes over the job of another node
[19:31] (riel) "machine" in this context can be anything, btw ...
[19:31] (riel) if a disk fails, another disk will take over
[19:31] (riel) if a machine from a cluster fails, the other machines take over the task
[19:31] (riel) but to have failover, you need to have good software support
[19:31] (riel) because most of the time you will be using standard computer components
[19:32] (marcelo) well, this is all the "theory" needed to explain the next parts.
[19:33] (riel) so let me make a quick condensation of this introduction
[19:33] (riel) 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable
[19:34] (riel) 2. high availability is the collection of these methods
[19:34] (riel) 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software
[19:34] (riel) 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF
[19:35] (riel) 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over
[19:35] (riel) (this is called "failover")
[19:35] (riel) now I think we should explain a bit about how high availability works .. the technical side
[19:36] (riel) umm wait ... sorry marcelo ;)
[19:36] (marcelo) ok
[19:37] (marcelo) Lets talk about the basic components of HA
[19:37] (marcelo) Or at least some of them,
[19:38] (marcelo) A simple disk running a filesystem is clearly an SPOF
[19:39] (marcelo) If the disk fails, every part of the system which depends on the data contained on it will stop.l
[19:41] (marcelo) To avoid a disk from being a SPOF of a system, RAID can be used.
[19:42] (marcelo) RAID-1, which is a feature of the Linux kernel...
[19:43] (marcelo) Allows "mirroring" of all data on the RAID device to a given number of disks...
[19:44] (marcelo) So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.
[19:44] (marcelo) This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working
[19:24] (dre) marcelo) the Linux model of high availability is software based. <---- but it has to have some hardware restricctions ?
[19:24] (riel) dre: indeed
[19:24] (marcelo) no
[19:25] (riel) dre: you have to have a little bit of hardware support, but not much ... we will explain this later
[19:25] (dre) ok
[19:26] (BASH) what is a High Availability system ?
[19:26] (dre) BASH: reload the log
[19:26] (lclaudio) you can also use additional hardware to enhance your HA solution
[19:27] (BASH) reload the log?
[19:27] (riel) BASH: a "system" is the collection of hardware+software to give you a particular service ... say a web site
[19:27] (riel) BASH: if any part of the system breaks, the system is dead
[19:27] (BASH) then?
[19:27] (riel) BASH: in high availability, you have a system which breaks less often ... say, a maximum of 5 minutes a year
[19:28] (riel) BASH: while a "normal" system is sometimes down for more than 5 _days_ a year
[19:28] (BASH) oic
[19:28] (BASH) so what is marcelo trying to explain?
[19:28] (riel) BASH: the reasons behind this
[19:29] (BASH) ic

[19:45] (riel) because the system has a copy of the data on each disk
[19:45] (riel) and can just use the other copies of the data
[19:45] (riel) this is another example of "failover" ... when one component fails, another component is used to fulfill this function
[19:46] (riel) and the system administrator can replace (or reformat/reboot/...) the wrong component
[19:46] (riel) this looks really simple when you don't look at it too muhc
[19:46] (riel) much
[19:46] (riel) but there is one big problem ... when do you need to do failover?
[19:47] (riel) in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful
[19:47] (riel) think for example of 2 machines which are fileservers for the same data
[19:47] (riel) at any time, one of the machines is working and the other is on standby
[19:47] (riel) when the main machine fails, the standby machine takes over
[19:47] (riel) ... BUT ...
[19:48] (riel) what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?
[19:48] (riel) which copy of the data is right, which copy of the data is wrong?
[19:48] (riel) or worse ... what if _both_ copies of the data are wrong?
[19:49] (riel) for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive
[19:49] (riel) for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program
[19:49] (riel) marcelo: could you tell us some of the things "heartbeat" does?
[19:49] (marcelo) sure
[19:50] (marcelo) "heartbeat" is a piece of software which monitors the availability of nodes
[19:50] (marcelo) it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.
[19:51] (marcelo) when a node is considered to be dead when can failover the services which it was running
[19:51] (marcelo) the services which we takeover are previously configured in both systems.
[19:52] (marcelo) Currently heartbeat works only with 2 nodes.
[19:53] (marcelo) Its been used in production environments in a lot of situations...
[19:54] (riel) there is one small problem, however
[19:54] (riel) what if the cleaning lady takes away the network cable between the cluster nodes by accident?
[19:54] (riel) and both nodes *think* they are the only one alive?
[19:54] (riel) ... and both nodes start messing with the data...
[19:55] (riel) unfortunately there is no way you can prevent this 100%
[19:55] (riel) but you can increase the reliability by simply having multiple means of communication
[19:55] (riel) say, 2 network cables and a serial cable
[19:56] (riel) and this is reliable enough that the failure of 1 component still allows good communication between the nodes
[19:56] (riel) so they can reliably tell if the other node is alive or not
[19:57] (riel) this was the introduction to HA
[19:57] (riel) now we will give some examples of HA software on Linux
[19:57] (riel) and show you how they are used ...
[19:58] (riel) ... we will wait shortly until the people doing the translation to Español have caught up) ... ;)
[19:58] (marcelo) Ok
[19:58] (marcelo) Now lets talk about the available software for Linux
[20:02] (riel) .. ok, the translators have caught up .. we can continue again ;)
[20:02] (marcelo) Note that I'll be talking about the opensource software for Linux
[20:03] (marcelo) As I said above, the "heartbeat" program provides monitoring and basic failover of services
[20:03] (marcelo) for two nodes only
[20:04] (marcelo) As a practical example...
[20:04] (marcelo) The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat
[19:45] (arjan) does this assume the device (disk) knows when it fails ?
[19:45] (marcelo) no
[19:46] (arjan) how does it handle a disk giving bad data then ?
[19:46] (dana) Are there going to be logs of this posted? (It's interesting but I gootta work) :(
[19:47] (lclaudio) Last time it was glued with the main talk in the webpage... I think dre has done the job :)
[19:48] (oroz) dana: yes... in http://umeet.uninet.edu
[19:48] (oroz) tomorrow
[19:58] (arjan) to bad you can't have a "vote out" with only 2 systems in a cluster
[19:59] (riel) arjan: quorum in a 2-node system is a project, yes
[20:00] (riel) arjan: but that's a bit too difficult a problem for this talk ;)
[20:00] (lclaudio) debUgo-: heartbeat is the name of the software tool and the name of the technique used by the tool.

[20:05] (marcelo) In case our primary web server fails, the standby node will detect and start the apache daemon
[20:05] (marcelo) making the service available again
[20:05] (marcelo) any service can be used, in theory, with heartbeat.
[20:05] (riel) so if one machine breaks, everybody can still go to our website ;)
[20:05] (marcelo) It only depends on the init scripts to start the service
[20:06] (marcelo) So any service which has a init script can be used with heartbeat
[20:06] (marcelo) arjan asked if takes over the IP address
[20:07] (marcelo) There is a virtual IP address used by the service
[20:07] (marcelo) which is the "virtual serverIP"
[20:07] (marcelo) which is the "virtual server" IP address.
[20:07] (marcelo) So, in our webserver case...
[20:08] (marcelo) the real IP address of the first node is not used by the apache daemon
[20:08] (marcelo) but the virtual IP address which will be used by the standby node in case failover happens
[20:09] (marcelo) Heartbeat, however, is limited to two nodes.
[20:10] (marcelo) This is a big problem for a lot of big systems.
[20:11] (marcelo) SGI has ported its FailSafe HA system to Linux recently (http://oss.sgi.com/projects/failsafe)
[20:11] (marcelo) FailSafe is a complete cluster manager which supports up to 16 nodes.
[20:11] (marcelo) Right now its not ready for production environments
[20:12] (marcelo) But thats being worked on by the Linux HA project people :)
[20:12] (marcelo) SGI's FailSafe is GPL.
[20:13] (riel) another type of clustering is LVS ... the Linux Virtual Server project
[20:13] (riel) LVS uses a very different approach to clustering
[20:13] (riel) you have 1 (maybe 2) machines that request http (www) requests
[20:14] (riel) but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work
[20:14] (riel) so called "working nodes"
[20:14] (riel) if one (or even more) of the working nodes fail, the others will do the work
[20:14] (riel) and all the routers (the machines sitting at the front) do is:
[20:15] (riel) 1. keep track of which working nodes are available
[20:15] (riel) 2. give the http requests to the working nodes
[20:15] (riel) the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work
[20:16] (riel) RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way
[20:16] (riel) in Conectiva, we are also working on a very nice HA project
[20:17] (riel) the project marcelo and Olive are working on is called "drbd"
[20:17] (riel) the distributed redundant block device
[20:17] (riel) this is almost the same as RAID1, only over the network
[20:17] (riel) to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data
[20:17] (riel) with one copy of the data on every disk
[20:18] (riel) drdb extends this idea to use disks on different machines on the network
[20:18] (riel) so if one disk (on one machine) fails, the other machines still have the data
[20:18] (riel) and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run
[20:19] (riel) if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk
[20:19] (riel) and your programs can continue to run
[20:20] (riel) (with ext2, you would have to do an fsck first, which can take a long time)
[20:20] (riel) this can be used for fileservers, databases, webservers, ...
[20:20] (riel) everything where you need the very latest data to work
[20:20] (riel) ...
[20:21] (riel) this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (riel) [btw, this whole lecture was improvised my marcelo and me ... sorry if it was a bit messy at times ;)]
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:21] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:21] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:05] (arjan) does it also take over the IP address ?
[20:06] (riel) arjan: I guess so ... ;)

[20:22] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:22] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:22] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:22] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:22] (Fernand0) plas plas plas plas plas plas plas plas
[20:22] (Fernand0) questions here or in #qc riel ??
[20:22] (riel) Fernand0: I think we just got a question in #qc, so lets continue there ;)
[20:22] (Fernand0) nice
[20:23] (Fernand0) questions and comments in #qc
[20:23] (marcelo) arjan asked "are IS departments actually happy with the IP failover ?"
[20:23] (marcelo) Yes.
[20:23] (marcelo) :)
[20:23] (riel) arjan: if you need high availability, IP takeover should not be a problem
[20:25] (riel) any other questions ?
[20:26] (riel) if anybody wants more information, you can go to http://www.linux-ha.org/
[20:22] (arjan) are IS departments actually happy with the IP failover ?
[20:23] (arjan) hehe
[20:23] (arjan) not where I work :)
[20:23] (marcelo) why?
[20:23] (arjan) they seem to not be able to "debug" the network because
[20:24] (arjan) of the "takeover"s
[20:24] (riel) arjan: that's their problem
[20:24] (marcelo) I dont see the problem
[20:24] (riel) arjan: no HA for your IS people, then .. ;)
[20:24] (arjan) riel: and not for my servers either
[20:24] (riel) marcelo: there is no problem
[20:24] (riel) marcelo: it is a management thing to not allow IP takeover
[20:25] (riel) any other questions ?
[20:26] (riel) questions and random discussion can always continue here
[20:27] (Fernand0) well,a question ....
[20:27] (Fernand0) a diffuse question, i'm afraid
[20:27] * riel listens
[20:27] (Fernand0) HA is to have the adequate information in the adequate place, isn't it?
[20:27] (Fernand0) at least, part of the problem is this, if I understand well
[20:28] (Fernand0) can it be related to memory problems presented in the talk of the past day ?
[20:28] (Fernand0) or they are completely different problems?

[20:30] (debUgo-) riel: drdb is based on nbd?
[20:30] (marcelo) debugo-, no, its not.
[20:30] (riel) Fernand0: they are very different problems
[20:31] (debUgo-) marcelo: ok, thx
[20:31] (marcelo) debugo, its similar to nbd in some ways...
[20:31] (marcelo) but its not based on nbd

[20:33] (debUgo-) marcelo: did you think that Intermezzo and other GPL'ed distributed fs are stable enough for production HA environments?
[20:34] (riel) debUgo-: Inter-mezzo is almost ready, from what I know
[20:34] (marcelo) debugo, not yet.
[20:34] (marcelo) GFS is quite close to be used in production environments, though.

[20:43] (debUgo-) riel/marcelo: has you ever checked IBM´s AFS or EVMS? Did you think that this contribution from IBM would make Linux stronger?
[20:43] (debUgo-) if grammar sux, it´s not my fault :)
[20:44] (riel) debUgo-: I haven't used AFS, but have heard good things about it
[20:44] (riel) debUgo-: IBM promised they would release AFS in oktober (september?) 2000
[20:44] (marcelo) riel, they released already :)
[20:44] (debUgo-) well... there is OpenAFS
[20:44] (riel) debUgo-: but they have not released anything yet or told anybody what is happening ...
[20:44] (marcelo) I havent looked at it yet, though.
[20:44] (riel) marcelo: they did?
[20:44] (riel) marcelo: cool
[20:44] (marcelo) debugo, what is EVMS?
[20:44] (debUgo-) riel: http://oss.software.ibm.com/developerworks/opensource/afs/ go and hack it =)
[20:44] (riel) marcelo: last month, it wasn't released yet ;)
[20:45] (marcelo) ls
[20:45] (riel) debUgo-: not me...
[20:45] (marcelo) idiot
[20:45] (debUgo-) EVMS: Enterprise Volume Management System
[20:45] (riel) debUgo-: but I'm looking forward to your patches
[20:45] (debUgo-) wow
[20:45] (marcelo) debugo, LVM 1.0 will include a clustered LVM
[20:45] (riel) debUgo-: ahh, yes
[20:45] (riel) debUgo-: EVMS is a useful tool
[20:45] (marcelo) and I dont know if IBM wants to make EVMS opensource
[20:46] (trusmis) clustered lvm?
[20:46] (riel) debUgo-: but I don't know if it will be better than the linux LVM or not
[20:46] (riel) marcelo: they will
[20:46] (marcelo) trusmis, yes.
[20:46] (riel) marcelo: I talked about it with an EVMS developer in Miami
[20:47] (marcelo) trusmis, so you can, for example, have different nodes mess (resize, etc) with a shared device
[20:47] (marcelo) without conflicts
[20:47] (debUgo-) seems pretty cool...
[20:47] (debUgo-) so, i just need a couple of machines to test :o/
[20:47] (riel) debUgo-: ;)
[20:47] (trusmis) marcelo: in one computer or clustered beetween several computers?
[20:48] (debUgo-) trusmis: clustered... over network
[20:48] (riel) trusmis: clustered nodes ... but with _shared_ disks
[20:48] (riel) trusmis: a bunch of disks are on part in the "network"
[20:48] (riel) trusmis: and some CPU+memory+... are the other parts of the "network"
[20:48] (marcelo) debugo, not really
[20:49] (marcelo) the access is done via shared SCSI or fibre channel;
[20:49] (riel) trusmis: and you have multiple computers using the same disk(s)
[20:49] (debUgo-) marcelo: wow! that's really cool... some companys would trash his WinNT solutions =)

[20:53] (debUgo-) that 3 computers can access data simultaneously?
[20:54] (riel) trusmis: with all those computers talking to the same disk
[20:54] (riel) debUgo-: what is so difficult about that? ;)
[20:54] (riel) debUgo-: yes
[20:54] (debUgo-) i'm thinking about postgresql...
[20:54] (riel) debUgo-: and the difficult part in software is to make sure they don't corrupt each other's data
[20:55] (debUgo-) each machine runs a postgresql instance, and all 3 machines read the -same- database
[20:55] (Ricardo) debUgo-: you mean the same physical representation of the database.
[20:55] (riel) debUgo-: in that case, postgresql has to make sure it doesn't mess with its own data
[20:55] (riel) Ricardo: nope
[20:56] (riel) Ricardo: we mean "1 disk shared by 3 computers"
[20:56] (Ricardo) riel: I meant "physical" ;)
[20:56] (Ricardo) riel: Well, 'logical' seems more adequate :)
[20:56] (riel) Ricardo: wrong
[20:57] (Ricardo) Uh
[20:57] (riel) Ricardo: 1 _physical_ disk
[20:57] (Ricardo) Well
[20:57] (debUgo-) gotta go... boss seems angry
[20:57] (Ricardo) I missed something :?
[20:57] (riel) Ricardo: shared by _3_ computers
[20:57] (debUgo-) =P
[20:57] (riel) debUgo-: bye
[20:57] (Ricardo) Ok
[20:57] (riel) Ricardo: you seem to be confusing logical and physical ;)
[20:57] (debUgo-) thanks riel, marcelo
[20:57] (trusmis) my point , what's the difference with nfs?
[20:57] (debUgo-) bye *
[20:57] (MJesus)thank debugo,
[20:58] (riel) trusmis: with NFS, if the server is down nobody can reach the data
[20:58] (Ricardo) riel: So I was right the first time. The three servers are going to access to the physical rep. (files) of the DB :)
[20:58] (riel) Ricardo: yes
[20:59] (Ricardo) riel: well :)
[20:59] (trusmis) so the device is not conected with a computer? (seems strange)
[20:59] (riel) trusmis: the disk is connected with computers allright
[20:59] (riel) trusmis: not just 1, but _3_
[21:00] (trusmis) physically conected? i mean 3 connectors?
[21:00] (riel) trusmis: no
[21:00] (riel) trusmis: 1 connector on the disk, 1 connector on each computer
[21:01] (riel) trusmis: and 1 cable going to all 4 connectors
[21:01] (riel) trusmis: how is this different from 1 computer with 3 disks and 1 cable going to 4 connectors? ;)
[21:01] (trusmis) ok , catched

[21:03] (trusmis) and that's very expensive?
[21:06] (riel) trusmis: depends
[21:07] (trusmis) on what?
[21:07] (trusmis) well, i imagine on what
[21:07] (trusmis) but , comparing with ide devices, it's 3 4 times more ?
[21:07] (riel) trusmis: possibly
[21:08] (riel) trusmis: but I don't know for sure ... I never buy those things ;)
[21:09] (lclaudio) Marcelo told me that a "host adapter"for fiberchannel may cost US$500
[21:09] (trusmis) i have missed the talk but i imagine you have talked about distributed computing
[21:09] (marcelo) no
[21:09] (marcelo) 300$
[21:10] (marcelo) a new, very good one.
[21:10] (lclaudio) ooops... :) and you need one of them for each host...
[21:11] (MJesus)300*200 = 60000 pts uh! only? ????
[21:12] (trusmis) any idea about if something that distributed the risk (computing) between computers or something like that is going to be included in linux kernel?
[21:12] (trusmis) not only devides but even distribute the proceses
[21:13] (riel) trusmis: I don't know
[21:13] (riel) trusmis: some things, maybe
[21:14] (riel) trusmis: other things are just too complex for the standard Linux kernel
[21:14] (oroz) bye!
[21:14] (Ricardo) bye oroz
[21:14] (trusmis) i think a http server is too complex too
[21:15] (lclaudio) trusmis: there are some efforts on load distyribution for services, like LVS.
[21:15] (trusmis) yes i know
[21:16] (trusmis) i just wonder if in lkml had been any thread about it
[21:16] (lclaudio) Linux isn't a distributed system at all... there are the Mosix people trying to develop a distributed version of Linux (IIRC) and the default tools for parallel processing (PVM, MPI,...)
[21:17] (trusmis) lclaudio: i know all these things, i want to know about any effort i don't know about inside kernel hackers

[21:20] (trusmis) now linux can be consider a good high availabity system or need many things ?
[21:20] (lclaudio) trusmis: so... no news :)
[21:22] (lclaudio) trusmis: it's kinda difficult to answer this question... there are lots of sparse tools for HA and now the comunity is working on get them working together.
[21:23] (lclaudio) trusmis: I think the answer is yes. Besides some points you can't address right now, you can have a good level of HA with the currently available tools
[21:23] (lclaudio) trusmis: it depends mainly on what you wanna do.
[21:23] (trusmis) what's the thing you think linux lack more
[21:23] (trusmis) about HA of course

[21:25] (lclaudio) Righ now I miss a distributed filesystem. there are some projects near to production level but near don't suffices for HA/
[21:26] (lclaudio) A good distributed filesystem would make the file consistense and coherence problem simpler to deal with.
[21:26] (Ricardo) Chris Isaak-Wicked Game.mp3:
[21:26] (Ricardo) Ups
[21:26] (Ricardo) O:)
[21:26] (Ricardo) Sorry
[21:26] (trusmis) you mean "i can have /usr in other computer??" or "i can access to other computers hd in /name ?"
[21:26] (Ricardo) Now the right one (damned clipboard)
[21:26] (Ricardo) Uhh... I'm afraid, but I've to cut this interesting talk, people :) We have been here for near two hours and a half. I think we "officialy" close this lecture of today.
[21:27] (Ricardo) Of course, you can follow talking :)
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (Ricardo) It's only a "burocreatic" thing :)
[21:27] (MJesus)hehehhehehe and for the clap clap
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (lclaudio) trusmis: I mean you can have all your files distributed and replicater over your network. Reliably and safely.
[21:28] * lclaudio handclaps :)
[21:28] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:28] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:29] (MJesus)perdon by the beak
[21:29] (trusmis) like nfs but with all you HD and over all the network
[21:29] (trusmis) ?
[21:29] (trusmis) interesting and difficult to do
[21:29] (lclaudio) Not like NFS... completely different.
[21:30] (trusmis) the problem is that then you may have a lot of replication..........
[21:30] (lclaudio) trusmis: go to linux-ha.org and take a look at Intemezzo, Coda, M2FS, ...
[21:31] (trusmis) ah, ok
[21:31] (MJesus)I introduce trusmis..... he also is linux developer
[21:31] (lclaudio) trusmis: not. There are lots of good algos in this subject, designed to avoid heavy network traffic and stuff
[21:32] (lclaudio) MJesus: excuse me for breaking your "clap plas" rainbow :)
[21:32] (trusmis) lot of replication= i don't mind if you switch off that computer
[21:33] * trusmis is really interesting and is going to look for it

Contact: