Logo Umeet2000

Presentación

Registrarse

Programa

Desarrollo

Participación

Normas para los autores

Comité de Honor

Comité Organizador

Comité Científico

Comité Técnico

Patrocinadores

Servidores espejo (mirrors)

Noticias

Recortes de prensa,enlaces


Desarrollo

Recuerden Central European Time (CET) es Greenwich Meridian Time + 1 (GMT + 1).

TALKING IN #LINUX TALKING IN #QC

[18:54] *** Now talking in #linux
[18:54] *** Topic is '19 horas CET, Rik van Riel, and Marcelo Tosatti, Linux and High Availability (English and Portugues)'
[18:59] (MJesus)www.barrapunto.com is very interesting now ! also, www.linuxpreview.org
[19:06] (marcelo) Well
[19:06] (marcelo) I think we should start now
[19:06] * DaniT__ les informa que para los que tengan problemas con el ingles se intentara realizar una traducción simultanea en el canal #media
[19:07] (Fernand0) we usually wait for five minutes in order to people has time to come
[19:07] (marcelo) ok.
[19:07] (Fernand0) thanks :)
[19:09] (MJesus)ATENCION en #media hay traduccion simultanea a español de esta charla
[19:09] (dre) ATENCION en #media hay traduccion simultanea a español de esta charla <---- en la medida de los posible....

[19:10] (Fernand0) well
[19:11] (Fernand0) Hi,
[19:11] (Fernand0) we are very pleased to present you today two kernel hackers who are working
[19:11] (Fernand0) for Conectiva: Rik van Riel and Marcelo Tosatti.
[19:11] (Fernand0) As all of you know, Conectiva [www.conectiva.com, visit them :) ] is a big
[19:11] (Fernand0) Linux company from South America.
[19:11] (Fernand0) About Mr. van Riel, nothing new ;) (he talked here the other day).
[19:11] (Fernand0) Appart from kernel hacking, he also runs the Linux-MM website and the
[19:11] (Fernand0) #kernelnewbies IRC channel on openprojects.net
[19:11] (Fernand0) You can find more about him at: www.surriel.com.
[19:12] (Fernand0) About Mr. Tosatti: he has been working for Conectiva for the last 4 years
[19:12] (Fernand0) He maintains the Conectiva kernel RPM and
[19:12] (Fernand0) has been working in the areas of memory management, drbd (the
[19:12] (Fernand0) distributed redundant block device) and the ext3 filesystem.
[19:12] (Fernand0) In addition to that, he is active on the
[19:12] (Fernand0) Linux High Availability project and has contributed to the heartbeat
[19:12] (Fernand0) software.
[19:12] (Fernand0) You can reach his homepage at:
[19:12] (Fernand0) http://bazar.conectiva.com.br/~marcelo/
[19:12] (Fernand0) They will talk today here about "Linux and High Availability".
[19:12] (Fernand0) The talk will be in English-Portuguese, but we have some volunteers that
[19:12] (Fernand0) are going to translate it on-line to Spanish (in channel #media).
[19:12] (Fernand0) The talk will be here, in #linux channel, Mr Riel suggested us to make
[19:12] (Fernand0) another channel (#qc -) questions channel) to write questions during the talk.
[19:12] (Fernand0) Should you have any questions, comments, etc, just write them in #qc
[19:12] (Fernand0) and Mr. Riel will reply.
[19:12] (Fernand0) Thank you to Mr. Riel and Mr. Tosatti for comming here and also to all of you.
[19:12] (Fernand0) Our thanks are bigger in this ocasion because today's speakers
[19:13] (Fernand0) improvised a presentation for us when they knew about our 'official
[19:13] (Fernand0) speaker' was missing.
[19:13] (Fernand0) Mr. Riel, Mr. Tosatti ...
[19:13] (marcelo) Hi

[19:13] *** riel changes topic to '19 horas CET, Rik van Riel, and Marcelo Tosatti, Linux and High Availability (English and Portugues) || questions? #qc'
[19:13] (riel) good afternoon *
[19:13] (marcelo) So, first I'll start to explain what is High Availability
[19:13] (riel) today we will be giving a talk about Linux and High Availability (Alta Disponibilidade)
[19:14] (riel) if you have any questions, you can ask them at any time in #qc
[19:14] (riel) this channel is supposed to be silent, except for marcelo and me
[19:14] (riel) marcelo? Could you explain us what High Availability is?
[19:14] (marcelo) Ok, lets go...
[19:14] (marcelo) First I'll explain the concept of High Availability
[19:15] (marcelo) Availability is the probability that a system is running at a given time
[19:16] (marcelo) A system without any mechanism to enhance its availability is considered to be have basic availability
[19:16] (marcelo) s/be//
[19:16] (marcelo) This kind of system, in theory, has 99,9% of availability.
[19:17] (marcelo) So, during 1 year, its likely to have 5 days of downtime.
[19:17] (marcelo) When a system has more than 99,999% of availability its considered to be a high availability system
[19:19] (marcelo) So, for a system to be considered "highly available" it must, in theory, have 5 minutes of downtine during the period of 1 year.
[19:19] (marcelo) The old model of high availability is "fault tolerance" usually hardware-based.
[19:20] (marcelo) Expensive, proprietary.
[19:20] (marcelo) This old model goal is to have the hardware system running
[19:21] (andres) plas
[19:22] (riel) so basically, a single computer is an unreliable piece of shit (relatively speaking) ...
[19:22] (riel) ... and High Availability is the collection of methods to make the job the computer does more reliable
[19:22] (riel) you can do that by better hardware structures
[19:22] (riel) or by better software structures
[19:22] (riel) usually a combination of both
[19:23] (marcelo) the Linux model of high availability is software based.

[19:11] *** Now talking in #qc
[19:12] *** riel changes topic to 'questions & discussion to the High Availabity talk by marcelo & riel'
[19:14] (riel) ok, while marcelo explains in #linux, I am always ready to answer some questions here, if needed
[19:15] (riel) but first, lets listen to marcelo ;)
[19:15] (MJesus)ok
[19:15] (kerniXid) :)
[19:18] (riel) one small correction to marcelo's numbers ... 99.9% is 5 _hours_ of downtime each year (but that's not very important)
[19:19] (MJesus)as winsuck bugenium!

[19:24] (marcelo) Now let me explain some basic concepts of HA
[19:26] (marcelo) First, its very important that we dont rely on unique hardware components in a High Availability system
[19:26] (marcelo) for example, you can have two network cards connected to a network
[19:26] (marcelo) In case one of the cards fail, the system tries to use the other card.
[19:28] (marcelo) A hardware component that cannot fail because the whole system depends on it is called a "Single Point of Failure"
[19:28] (marcelo) SPOF, to make it short. :)
[19:29] (marcelo) Another important concept which must be known before we continue is "failover"
[19:30] (marcelo) Failover is the process which one machine takes over the job of another node
[19:31] (riel) "machine" in this context can be anything, btw ...
[19:31] (riel) if a disk fails, another disk will take over
[19:31] (riel) if a machine from a cluster fails, the other machines take over the task
[19:31] (riel) but to have failover, you need to have good software support
[19:31] (riel) because most of the time you will be using standard computer components
[19:32] (marcelo) well, this is all the "theory" needed to explain the next parts.
[19:33] (riel) so let me make a quick condensation of this introduction
[19:33] (riel) 1. normal computers are not reliable enough for some people (like: internet shop), so we need a trick .. umm method ... to make the system more reliable
[19:34] (riel) 2. high availability is the collection of these methods
[19:34] (riel) 3. you can do high availability by using special hardware (very expensive) or by using a combination of normal hardware and software
[19:34] (riel) 4. if one point in the system breaks and it makes the whole system break, that point is a single point of failure .. SPOF
[19:35] (riel) 5. for high availability, you should have no SPOFs ... if one part of the system breaks, another part of the system should take over
[19:35] (riel) (this is called "failover")
[19:35] (riel) now I think we should explain a bit about how high availability works .. the technical side
[19:36] (riel) umm wait ... sorry marcelo ;)
[19:36] (marcelo) ok
[19:37] (marcelo) Lets talk about the basic components of HA
[19:37] (marcelo) Or at least some of them,
[19:38] (marcelo) A simple disk running a filesystem is clearly an SPOF
[19:39] (marcelo) If the disk fails, every part of the system which depends on the data contained on it will stop.l
[19:41] (marcelo) To avoid a disk from being a SPOF of a system, RAID can be used.
[19:42] (marcelo) RAID-1, which is a feature of the Linux kernel...
[19:43] (marcelo) Allows "mirroring" of all data on the RAID device to a given number of disks...
[19:44] (marcelo) So, when data is written to the RAID device, its replicated between all disks which are part of the RAID1 array.
[19:44] (marcelo) This way, if one disk fails, the other (or others) disks on the RAID1 array will be able to continue working

[19:24] (dre) marcelo) the Linux model of high availability is software based. <---- but it has to have some hardware restricctions ?
[19:24] (riel) dre: indeed
[19:24] (marcelo) no
[19:25] (riel) dre: you have to have a little bit of hardware support, but not much ... we will explain this later
[19:25] (dre) ok
[19:26] (BASH) what is a High Availability system ?
[19:26] (dre) BASH: reload the log
[19:26] (lclaudio) you can also use additional hardware to enhance your HA solution
[19:27] (BASH) reload the log?
[19:27] (riel) BASH: a "system" is the collection of hardware+software to give you a particular service ... say a web site
[19:27] (riel) BASH: if any part of the system breaks, the system is dead
[19:27] (BASH) then?
[19:27] (riel) BASH: in high availability, you have a system which breaks less often ... say, a maximum of 5 minutes a year
[19:28] (riel) BASH: while a "normal" system is sometimes down for more than 5 _days_ a year
[19:28] (BASH) oic
[19:28] (BASH) so what is marcelo trying to explain?
[19:28] (riel) BASH: the reasons behind this
[19:29] (BASH) ic

[19:45] (riel) because the system has a copy of the data on each disk
[19:45] (riel) and can just use the other copies of the data
[19:45] (riel) this is another example of "failover" ... when one component fails, another component is used to fulfill this function
[19:46] (riel) and the system administrator can replace (or reformat/reboot/...) the wrong component
[19:46] (riel) this looks really simple when you don't look at it too muhc
[19:46] (riel) much
[19:46] (riel) but there is one big problem ... when do you need to do failover?
[19:47] (riel) in some situations, you would have _2_ machines working at the same time and corrupting all data ... when you are not careful
[19:47] (riel) think for example of 2 machines which are fileservers for the same data
[19:47] (riel) at any time, one of the machines is working and the other is on standby
[19:47] (riel) when the main machine fails, the standby machine takes over
[19:47] (riel) ... BUT ...
[19:48] (riel) what if the standby machine only _thinks_ the main machine is dead and both machines do something with the data?
[19:48] (riel) which copy of the data is right, which copy of the data is wrong?
[19:48] (riel) or worse ... what if _both_ copies of the data are wrong?
[19:49] (riel) for this, there is a special kind of program, called a "heartbeating" program, which checks which parts of the system are alive
[19:49] (riel) for Linux, one of these programs is called "heartbeat" ... marcelo and lclaudio have helped writing this program
[19:49] (riel) marcelo: could you tell us some of the things "heartbeat" does?
[19:49] (marcelo) sure
[19:50] (marcelo) "heartbeat" is a piece of software which monitors the availability of nodes
[19:50] (marcelo) it "pings" the node which it wants to monitor, and, in case this node doesnt answer the "pings", it considers it to be dead.
[19:51] (marcelo) when a node is considered to be dead when can failover the services which it was running
[19:51] (marcelo) the services which we takeover are previously configured in both systems.
[19:52] (marcelo) Currently heartbeat works only with 2 nodes.
[19:53] (marcelo) Its been used in production environments in a lot of situations...
[19:54] (riel) there is one small problem, however
[19:54] (riel) what if the cleaning lady takes away the network cable between the cluster nodes by accident?
[19:54] (riel) and both nodes *think* they are the only one alive?
[19:54] (riel) ... and both nodes start messing with the data...
[19:55] (riel) unfortunately there is no way you can prevent this 100%
[19:55] (riel) but you can increase the reliability by simply having multiple means of communication
[19:55] (riel) say, 2 network cables and a serial cable
[19:56] (riel) and this is reliable enough that the failure of 1 component still allows good communication between the nodes
[19:56] (riel) so they can reliably tell if the other node is alive or not
[19:57] (riel) this was the introduction to HA
[19:57] (riel) now we will give some examples of HA software on Linux
[19:57] (riel) and show you how they are used ...
[19:58] (riel) ... we will wait shortly until the people doing the translation to Español have caught up) ... ;)
[19:58] (marcelo) Ok
[19:58] (marcelo) Now lets talk about the available software for Linux
[20:02] (riel) .. ok, the translators have caught up .. we can continue again ;)
[20:02] (marcelo) Note that I'll be talking about the opensource software for Linux
[20:03] (marcelo) As I said above, the "heartbeat" program provides monitoring and basic failover of services
[20:03] (marcelo) for two nodes only
[20:04] (marcelo) As a practical example...
[20:04] (marcelo) The web server at Conectiva (www.conectiva.com.br) has a standby node running heartbeat

[19:45] (arjan) does this assume the device (disk) knows when it fails ?
[19:45] (marcelo) no
[19:46] (arjan) how does it handle a disk giving bad data then ?
[19:46] (dana) Are there going to be logs of this posted? (It's interesting but I gootta work) :(
[19:47] (lclaudio) Last time it was glued with the main talk in the webpage... I think dre has done the job :)
[19:48] (oroz) dana: yes... in http://umeet.uninet.edu
[19:48] (oroz) tomorrow
[19:58] (arjan) to bad you can't have a "vote out" with only 2 systems in a cluster
[19:59] (riel) arjan: quorum in a 2-node system is a project, yes
[20:00] (riel) arjan: but that's a bit too difficult a problem for this talk ;)
[20:00] (lclaudio) debUgo-: heartbeat is the name of the software tool and the name of the technique used by the tool.

[20:05] (marcelo) In case our primary web server fails, the standby node will detect and start the apache daemon
[20:05] (marcelo) making the service available again
[20:05] (marcelo) any service can be used, in theory, with heartbeat.
[20:05] (riel) so if one machine breaks, everybody can still go to our website ;)
[20:05] (marcelo) It only depends on the init scripts to start the service
[20:06] (marcelo) So any service which has a init script can be used with heartbeat
[20:06] (marcelo) arjan asked if takes over the IP address
[20:07] (marcelo) There is a virtual IP address used by the service
[20:07] (marcelo) which is the "virtual serverIP"
[20:07] (marcelo) which is the "virtual server" IP address.
[20:07] (marcelo) So, in our webserver case...
[20:08] (marcelo) the real IP address of the first node is not used by the apache daemon
[20:08] (marcelo) but the virtual IP address which will be used by the standby node in case failover happens
[20:09] (marcelo) Heartbeat, however, is limited to two nodes.
[20:10] (marcelo) This is a big problem for a lot of big systems.
[20:11] (marcelo) SGI has ported its FailSafe HA system to Linux recently (http://oss.sgi.com/projects/failsafe)
[20:11] (marcelo) FailSafe is a complete cluster manager which supports up to 16 nodes.
[20:11] (marcelo) Right now its not ready for production environments
[20:12] (marcelo) But thats being worked on by the Linux HA project people :)
[20:12] (marcelo) SGI's FailSafe is GPL.
[20:13] (riel) another type of clustering is LVS ... the Linux Virtual Server project
[20:13] (riel) LVS uses a very different approach to clustering
[20:13] (riel) you have 1 (maybe 2) machines that request http (www) requests
[20:14] (riel) but those machines don't do anything, except send the requests to a whole bunch of machines that do the real work
[20:14] (riel) so called "working nodes"
[20:14] (riel) if one (or even more) of the working nodes fail, the others will do the work
[20:14] (riel) and all the routers (the machines sitting at the front) do is:
[20:15] (riel) 1. keep track of which working nodes are available
[20:15] (riel) 2. give the http requests to the working nodes
[20:15] (riel) the kernel needs a special TCP/IP patch and a set of usermode utilities for this to work
[20:16] (riel) RedHat's "piranha" tool is a configuration tool for LVS, that people can use to setup LVS clusters in a more easy way
[20:16] (riel) in Conectiva, we are also working on a very nice HA project
[20:17] (riel) the project marcelo and Olive are working on is called "drbd"
[20:17] (riel) the distributed redundant block device
[20:17] (riel) this is almost the same as RAID1, only over the network
[20:17] (riel) to go back to RAID1 (mirroring) ... RAID1 is using 2 (or more) disks to store your data
[20:17] (riel) with one copy of the data on every disk
[20:18] (riel) drdb extends this idea to use disks on different machines on the network
[20:18] (riel) so if one disk (on one machine) fails, the other machines still have the data
[20:18] (riel) and if one complete machine fails, the data is on another machine ... and the system as a whole continues to run
[20:19] (riel) if you use this together with ext3 or reiserfs, the machine that is still running can very quickly take over the filesystem that it has copied to its own disk
[20:19] (riel) and your programs can continue to run
[20:20] (riel) (with ext2, you would have to do an fsck first, which can take a long time)
[20:20] (riel) this can be used for fileservers, databases, webservers, ...
[20:20] (riel) everything where you need the very latest data to work
[20:20] (riel) ...
[20:21] (riel) this is the end of our part of the lecture, if you have any questions, you can ask them and we will try to give you a good answer ;)
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (riel) [btw, this whole lecture was improvised my marcelo and me ... sorry if it was a bit messy at times ;)]
[20:21] (Fernand0) plas plas plas plas plas plas plas plas
[20:21] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:21] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:21] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap

[20:05] (arjan) does it also take over the IP address ?
[20:06] (riel) arjan: I guess so ... ;)

[20:22] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:22] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:22] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[20:22] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[20:22] (Fernand0) plas plas plas plas plas plas plas plas
[20:22] (Fernand0) questions here or in #qc riel ??
[20:22] (riel) Fernand0: I think we just got a question in #qc, so lets continue there ;)
[20:22] (Fernand0) nice
[20:23] (Fernand0) questions and comments in #qc
[20:23] (marcelo) arjan asked "are IS departments actually happy with the IP failover ?"
[20:23] (marcelo) Yes.
[20:23] (marcelo) :)
[20:23] (riel) arjan: if you need high availability, IP takeover should not be a problem
[20:25] (riel) any other questions ?
[20:26] (riel) if anybody wants more information, you can go to http://www.linux-ha.org/

[20:22] (arjan) are IS departments actually happy with the IP failover ?
[20:23] (arjan) hehe
[20:23] (arjan) not where I work :)
[20:23] (marcelo) why?
[20:23] (arjan) they seem to not be able to "debug" the network because
[20:24] (arjan) of the "takeover"s
[20:24] (riel) arjan: that's their problem
[20:24] (marcelo) I dont see the problem
[20:24] (riel) arjan: no HA for your IS people, then .. ;)
[20:24] (arjan) riel: and not for my servers either
[20:24] (riel) marcelo: there is no problem
[20:24] (riel) marcelo: it is a management thing to not allow IP takeover
[20:25] (riel) any other questions ?
[20:26] (riel) questions and random discussion can always continue here
[20:27] (Fernand0) well,a question ....
[20:27] (Fernand0) a diffuse question, i'm afraid
[20:27] * riel listens
[20:27] (Fernand0) HA is to have the adequate information in the adequate place, isn't it?
[20:27] (Fernand0) at least, part of the problem is this, if I understand well
[20:28] (Fernand0) can it be related to memory problems presented in the talk of the past day ?
[20:28] (Fernand0) or they are completely different problems?

[20:30] (debUgo-) riel: drdb is based on nbd?
[20:30] (marcelo) debugo-, no, its not.
[20:30] (riel) Fernand0: they are very different problems
[20:31] (debUgo-) marcelo: ok, thx
[20:31] (marcelo) debugo, its similar to nbd in some ways...
[20:31] (marcelo) but its not based on nbd

[20:33] (debUgo-) marcelo: did you think that Intermezzo and other GPL'ed distributed fs are stable enough for production HA environments?
[20:34] (riel) debUgo-: Inter-mezzo is almost ready, from what I know
[20:34] (marcelo) debugo, not yet.
[20:34] (marcelo) GFS is quite close to be used in production environments, though.

[20:43] (debUgo-) riel/marcelo: has you ever checked IBM´s AFS or EVMS? Did you think that this contribution from IBM would make Linux stronger?
[20:43] (debUgo-) if grammar sux, it´s not my fault :)
[20:44] (riel) debUgo-: I haven't used AFS, but have heard good things about it
[20:44] (riel) debUgo-: IBM promised they would release AFS in oktober (september?) 2000
[20:44] (marcelo) riel, they released already :)
[20:44] (debUgo-) well... there is OpenAFS
[20:44] (riel) debUgo-: but they have not released anything yet or told anybody what is happening ...
[20:44] (marcelo) I havent looked at it yet, though.
[20:44] (riel) marcelo: they did?
[20:44] (riel) marcelo: cool
[20:44] (marcelo) debugo, what is EVMS?
[20:44] (debUgo-) riel: http://oss.software.ibm.com/developerworks/opensource/afs/ go and hack it =)
[20:44] (riel) marcelo: last month, it wasn't released yet ;)
[20:45] (marcelo) ls
[20:45] (riel) debUgo-: not me...
[20:45] (marcelo) idiot
[20:45] (debUgo-) EVMS: Enterprise Volume Management System
[20:45] (riel) debUgo-: but I'm looking forward to your patches
[20:45] (debUgo-) wow
[20:45] (marcelo) debugo, LVM 1.0 will include a clustered LVM
[20:45] (riel) debUgo-: ahh, yes
[20:45] (riel) debUgo-: EVMS is a useful tool
[20:45] (marcelo) and I dont know if IBM wants to make EVMS opensource
[20:46] (trusmis) clustered lvm?
[20:46] (riel) debUgo-: but I don't know if it will be better than the linux LVM or not
[20:46] (riel) marcelo: they will
[20:46] (marcelo) trusmis, yes.
[20:46] (riel) marcelo: I talked about it with an EVMS developer in Miami
[20:47] (marcelo) trusmis, so you can, for example, have different nodes mess (resize, etc) with a shared device
[20:47] (marcelo) without conflicts
[20:47] (debUgo-) seems pretty cool...
[20:47] (debUgo-) so, i just need a couple of machines to test :o/
[20:47] (riel) debUgo-: ;)
[20:47] (trusmis) marcelo: in one computer or clustered beetween several computers?
[20:48] (debUgo-) trusmis: clustered... over network
[20:48] (riel) trusmis: clustered nodes ... but with _shared_ disks
[20:48] (riel) trusmis: a bunch of disks are on part in the "network"
[20:48] (riel) trusmis: and some CPU+memory+... are the other parts of the "network"
[20:48] (marcelo) debugo, not really
[20:49] (marcelo) the access is done via shared SCSI or fibre channel;
[20:49] (riel) trusmis: and you have multiple computers using the same disk(s)
[20:49] (debUgo-) marcelo: wow! that's really cool... some companys would trash his WinNT solutions =)

[20:53] (debUgo-) that 3 computers can access data simultaneously?
[20:54] (riel) trusmis: with all those computers talking to the same disk
[20:54] (riel) debUgo-: what is so difficult about that? ;)
[20:54] (riel) debUgo-: yes
[20:54] (debUgo-) i'm thinking about postgresql...
[20:54] (riel) debUgo-: and the difficult part in software is to make sure they don't corrupt each other's data
[20:55] (debUgo-) each machine runs a postgresql instance, and all 3 machines read the -same- database
[20:55] (Ricardo) debUgo-: you mean the same physical representation of the database.
[20:55] (riel) debUgo-: in that case, postgresql has to make sure it doesn't mess with its own data
[20:55] (riel) Ricardo: nope
[20:56] (riel) Ricardo: we mean "1 disk shared by 3 computers"
[20:56] (Ricardo) riel: I meant "physical" ;)
[20:56] (Ricardo) riel: Well, 'logical' seems more adequate :)
[20:56] (riel) Ricardo: wrong
[20:57] (Ricardo) Uh
[20:57] (riel) Ricardo: 1 _physical_ disk
[20:57] (Ricardo) Well
[20:57] (debUgo-) gotta go... boss seems angry
[20:57] (Ricardo) I missed something :?
[20:57] (riel) Ricardo: shared by _3_ computers
[20:57] (debUgo-) =P
[20:57] (riel) debUgo-: bye
[20:57] (Ricardo) Ok
[20:57] (riel) Ricardo: you seem to be confusing logical and physical ;)
[20:57] (debUgo-) thanks riel, marcelo
[20:57] (trusmis) my point , what's the difference with nfs?
[20:57] (debUgo-) bye *
[20:57] (MJesus)thank debugo,
[20:58] (riel) trusmis: with NFS, if the server is down nobody can reach the data
[20:58] (Ricardo) riel: So I was right the first time. The three servers are going to access to the physical rep. (files) of the DB :)
[20:58] (riel) Ricardo: yes
[20:59] (Ricardo) riel: well :)
[20:59] (trusmis) so the device is not conected with a computer? (seems strange)
[20:59] (riel) trusmis: the disk is connected with computers allright
[20:59] (riel) trusmis: not just 1, but _3_
[21:00] (trusmis) physically conected? i mean 3 connectors?
[21:00] (riel) trusmis: no
[21:00] (riel) trusmis: 1 connector on the disk, 1 connector on each computer
[21:01] (riel) trusmis: and 1 cable going to all 4 connectors
[21:01] (riel) trusmis: how is this different from 1 computer with 3 disks and 1 cable going to 4 connectors? ;)
[21:01] (trusmis) ok , catched

[21:03] (trusmis) and that's very expensive?
[21:06] (riel) trusmis: depends
[21:07] (trusmis) on what?
[21:07] (trusmis) well, i imagine on what
[21:07] (trusmis) but , comparing with ide devices, it's 3 4 times more ?
[21:07] (riel) trusmis: possibly
[21:08] (riel) trusmis: but I don't know for sure ... I never buy those things ;)
[21:09] (lclaudio) Marcelo told me that a "host adapter"for fiberchannel may cost US$500
[21:09] (trusmis) i have missed the talk but i imagine you have talked about distributed computing
[21:09] (marcelo) no
[21:09] (marcelo) 300$
[21:10] (marcelo) a new, very good one.
[21:10] (lclaudio) ooops... :) and you need one of them for each host...
[21:11] (MJesus)300*200 = 60000 pts uh! only? ????
[21:12] (trusmis) any idea about if something that distributed the risk (computing) between computers or something like that is going to be included in linux kernel?
[21:12] (trusmis) not only devides but even distribute the proceses
[21:13] (riel) trusmis: I don't know
[21:13] (riel) trusmis: some things, maybe
[21:14] (riel) trusmis: other things are just too complex for the standard Linux kernel
[21:14] (oroz) bye!
[21:14] (Ricardo) bye oroz
[21:14] (trusmis) i think a http server is too complex too
[21:15] (lclaudio) trusmis: there are some efforts on load distyribution for services, like LVS.
[21:15] (trusmis) yes i know
[21:16] (trusmis) i just wonder if in lkml had been any thread about it
[21:16] (lclaudio) Linux isn't a distributed system at all... there are the Mosix people trying to develop a distributed version of Linux (IIRC) and the default tools for parallel processing (PVM, MPI,...)
[21:17] (trusmis) lclaudio: i know all these things, i want to know about any effort i don't know about inside kernel hackers

[21:20] (trusmis) now linux can be consider a good high availabity system or need many things ?
[21:20] (lclaudio) trusmis: so... no news :)
[21:22] (lclaudio) trusmis: it's kinda difficult to answer this question... there are lots of sparse tools for HA and now the comunity is working on get them working together.
[21:23] (lclaudio) trusmis: I think the answer is yes. Besides some points you can't address right now, you can have a good level of HA with the currently available tools
[21:23] (lclaudio) trusmis: it depends mainly on what you wanna do.
[21:23] (trusmis) what's the thing you think linux lack more
[21:23] (trusmis) about HA of course

[21:25] (lclaudio) Righ now I miss a distributed filesystem. there are some projects near to production level but near don't suffices for HA/
[21:26] (lclaudio) A good distributed filesystem would make the file consistense and coherence problem simpler to deal with.
[21:26] (Ricardo) Chris Isaak-Wicked Game.mp3:
[21:26] (Ricardo) Ups
[21:26] (Ricardo) O:)
[21:26] (Ricardo) Sorry
[21:26] (trusmis) you mean "i can have /usr in other computer??" or "i can access to other computers hd in /name ?"
[21:26] (Ricardo) Now the right one (damned clipboard)
[21:26] (Ricardo) Uhh... I'm afraid, but I've to cut this interesting talk, people :) We have been here for near two hours and a half. I think we "officialy" close this lecture of today.
[21:27] (Ricardo) Of course, you can follow talking :)
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (Ricardo) It's only a "burocreatic" thing :)
[21:27] (MJesus)hehehhehehe and for the clap clap
[21:27] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:27] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (lclaudio) trusmis: I mean you can have all your files distributed and replicater over your network. Reliably and safely.
[21:28] * lclaudio handclaps :)
[21:28] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:28] (MJesus)plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas plas
[21:28] (MJesus)4clap 5clap 6clap 7clap 8clap 9clap 10clap 11clap 12clap 13clap
[21:29] (MJesus)perdon by the beak
[21:29] (trusmis) like nfs but with all you HD and over all the network
[21:29] (trusmis) ?
[21:29] (trusmis) interesting and difficult to do
[21:29] (lclaudio) Not like NFS... completely different.
[21:30] (trusmis) the problem is that then you may have a lot of replication..........
[21:30] (lclaudio) trusmis: go to linux-ha.org and take a look at Intemezzo, Coda, M2FS, ...
[21:31] (trusmis) ah, ok
[21:31] (MJesus)I introduce trusmis..... he also is linux developer
[21:31] (lclaudio) trusmis: not. There are lots of good algos in this subject, designed to avoid heavy network traffic and stuff
[21:32] (lclaudio) MJesus: excuse me for breaking your "clap plas" rainbow :)
[21:32] (trusmis) lot of replication= i don't mind if you switch off that computer
[21:33] * trusmis is really interesting and is going to look for it



Contact: umeet@uninet.edu