IV International Conference of Unix at Uninet
  • Presentation
  • Register
  • Program
  • Organizing Comittee
  • Listing of registered people
  • Translators team
Andrew Frederick Cowie

AfCfernand0: Andrew Cowie here. Lag check: this message sent at 20:55:00 UTC
fernand0ok
truluxese fernand0!
fernand0my clock says not much lag
trulux;)
AfCJust wanted to know whether I should try connecting to a different server. If this is fine, it is fine :)
fernand0it's fine
AfCfernand0: you may want to add this link to the topic:
AfCfernand0: http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/
AfCThose are some slides to accompany my chat.
fernand0ok
fernand0too long
fernand0he
AfCok
fernand0:)
AfC[ready, start logging whenever]
fernand0so far so good
fernand0Hello,
fernand0we have here today Andrew Frederick Cowie, from Australia. His company is
fernand0Operational Dynamics.
fernand0
fernand0He will talk about " Modern trends in Unix infrastructure
fernand0management [systems administration."
fernand0
fernand0We are grateful for this talk.
fernand0We are also grateful for you all comming
fernand0here.
fernand0
fernand0There is some material for the presentation at:
fernand0http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/
fernand0
fernand0As usual, this channel is for the talk, #qc for questions and comments, and
fernand0#redes for translation.
fernand0
fernand0AfC ...
AfCGood evening. Of course, it's morning here, GMT+11
AfC Os pido disculpas por que desafortunadamente no hablo español, aunque soy
AfC consciente de que muchos aqui si que lo hablan como lengua madre.  
AfC Yo crecí
AfC en Canadá y tambien hablo francés, pero dudo que esto me ayude mucho hoy.
AfC Como consecuencia, voy a hacer mi presentación en Inglés, pero intentaré no
AfC usar palabras muy coloquiales o frases idiomáticas.  
AfC[a friend translated that for me :)]
AfCSo what I want to talk to you about today is
AfCabout some trends that I have observed
AfCin the Infrastructure Management world.
AfCIt's definitely a Unix / Linux centric chat,
AfCalthough I should observe that the problem of managing Unix systems
AfCis inevitably a problem of managing systems from different "vendors" -
AfCie, a heterogeneous network.
AfCThat said, I pity the people with Windows servers in their datacenters.
AfC[[[go to slide 1 http://www.operationaldynamics.com/reference/talks/TendsUnixrInfrastructure/img1.html ]]]
AfCSo as I hope most of you have seen, I wrote up a few slides for today.
AfCThey are "informal" in the sense that this isn't a long winded carefully crafted set of corporate slides.
AfC[I do that for physical conferences]
AfCbut I wanted to provide some "eye candy" for you to as I talk
AfCThis talk is about TRENDS which usually is defined to mean "the way things are expected to be in the future"
AfCbut before I can get to far with that, I need to sketch out some history
AfCspecifically, Unix and Linux history According To Andrew (tm)
AfCThe funny thing about the last 30 years has been the cyclic moves between centralized and decentralized systems
AfCearly on we had one big Unix machine, with lots of "dumb" terminals connected to it by serial cables.
AfCMulitple users, on a single multiprocessing machine. That was the way mainframes were, and when Unix came on the scene, it was awesome at this.
AfCThis started to change with the advent of the graphical workstation for scientific modeling, visualization, and graphics. The workstations from SGI, Sun, HP, Digital, etc were very powerful machines in their own right, but they had an entire Unix installation on each one.
AfCImagine! Instead of one Unix machine in a department, there were now 10 - the central one, and then all these crazy workstations. Suddenly, the challenge of managing Unix exploded in complexity.
AfCThings like files and administration were, in those early days, still centralized. This is when NFS and NIS came out of Sun.
AfCA short while later...
AfC[Notice the suspicious lack of dates :)]
AfCIBM launched the Personal Computer. With the meteoric rise in popularity and affordability of the desktop PC, suddenly there was impressive computing power on individual machines ... but they were all disconnected, standalone, (and, not Unix).
AfCData transfer was "how fast can I carry a floppy disk from one person to another".
AfCThen came the PC networks - file services like Banyan, Novell Netware got their start in these environments. And her we had VERY slow networks trying to ship files around from a central file server.
AfCAbout this time came the small Unix variants - Xenix, later SCO Unix; Minix; the *BSDs; and then one day, Linux.
AfCAnd the rest, as they say, was history.
AfCNow. This is nothing terribly novel or new. But consider these trends as they have impacted the datacenter.
AfCToday, one of the major challenges is enterprise computing - providing the computing power necessary for some business application. Perhaps that is an e-commerce platform like E-Bay, or the financial back end of a bank.
AfCTwo consistent themes that emerge are attempts to SCALE these platforms.
AfCto deal with increased load. There are two strategies, of course - you can scale vertically; that is,
AfCadd more CPUs, faster drives, more memory, faster network switches; or you can try and scale horizontally,
AfCthat is, spread the load around smaller but more numerous machines.
AfCIn a typical e-commerce platform, you end up with both
AfCie, lots and lots of web servers (small machines), a smaller number of application servers in the middle layer (on stronger mid range hardware) all standing on a central core database engine running on some massive huge machine with as many CPUs and as much memory as can be afforded, and with a massive disk array behind it.
AfCThat's the standard "three tier architecture" (we'll be returning to that in a bit)
AfCbut basically it means large numbers of machines - *many of which are different*.
AfCSo as people try to scale further and further horizontally, they end up running out of rack space in their datacenter to put all these servers.
AfCwhich is pretty much where the market pressure that led to "blade servers" came from - smaller and smaller servers packed into as small a space as possible.
AfCNever mind how much power they draw or how much heat they generate...
AfCSo not too long ago we had 10s of machines to manage... now we typically have 100s or more to manage in the same space.
AfCClearly, you can't systems administer  those by hand.
AfC(which is another theme we'll return to)
AfCOf course, as large shops started to have thousands of machines, they started looking for ways to *cut down* the number of physical servers ... which led to the trend towards "virtualization", which is running multiple virtual servers on a single physical machine.
AfCOne of my colleagues, Director of Operations at Charles Schwab (an online trading brokerage), noted that they're trying to cut from > 4000 machines down to 1500.
AfC(Yikes!)
AfCVirtualization is no magic silver bullet, however much the people selling virtual server solutions might like it to be:
AfCTo quote Robert Heinlein - "There's No Such Thing As A Free Lunch" - if your application is CPU bound, then piling more server instances on a physical machine doesn't get you anything - they're all fighting for the same CPU resource. If they're IO bound, and IO is highly available relative to CPU, then perhaps you've an opportunity to get advantage
AfChowever, (and this is the point that everyone misses) 50 virtual web servers on a single physical machine *all have to share the same NIC*!!!
AfCand suddenly, the I/O slack (disks) that you thought you were getting advantage from is negated by the I/O through the network interface being jammed up.
AfCSo the summary there is that there *are* no easy answers, and neither blade servers (increased physical density) or virtual servers (increased logical density) are an automatic solution to your challenges.
AfCThat was all quite generic, I grant you. It bears on the subject "Unix Infrastructure Management" quite directly, though .... all these thousands of machines are running Unix (or more likely Linux) ... and YOU are the admin who has to make it work
AfC-
AfCAnother trend (and this will be quick) that we've seen evidence of is the proliferation (massive growth) of web interfaces to systems.
AfCAny of you who have broadband at home no doubt have a little hardware modem/firewall/router/switch device... and you configure it with a web interface, right?
AfCWell, everything else is like that to, these days.
AfCThe load balancers are controlled by a web interface...
AfCThe firewall config is controlled by a web interface...
AfCThe packet shaper is controlled by a web interface...
AfCThe database traffic monitor is a web interface...
AfCYour trouble ticket sytem is a web interface...
AfCYour knowledge base is a web interface...
AfCYou get the idea.
AfCThe last one is fair. But there are a few more which are of great concern:
AfCYour system monitoring & alert system is a web interface.
AfCYOUR DAMN WEB SITE IS A WEB INTERFACE :)
AfCThe trouble for Unix admins is that we quietly, and without anyone stopping to think about it too much, went from a world where we could certainly control a single system node from a command line interface,
AfCand with careful effort (see later in the presentation) could drive an entire production platform of 10s or 1000s of machines using command line tools,
AfCto a situation where you have to log into endless *specific* web interfaces to adjust the configuration of devices and sub-systems.
AfCYADWIIHYLIT
AfCYet
AfCAnother
AfCDamn
AfCWeb
AfCInterface
AfCI
AfCHave
AfCTo
AfCLog
AfCIn
AfCTo!
AfCYADWIIHTLIT
AfC(whatever)
AfC:)
AfCthe point is that the one thing that can protect your sanity - automating systems administration tasks, has been serious undermined by the fact that you can no longer centrally define, manage, or repair a datacenter in its entirety.
AfC[And worst of all, how many different logins and passwords for each stupid device and sub-system do you have to remember / record somewhere? Out of control. And no, SNMP doesn't cut it]
AfC[[[ Time for the next slide. See http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img2.html ]]]
AfCI'm not going to belabour the point about complexity being your enemy. But I will comment on one trend which is of great concern: because they are independent commercial entities, vendors do not work together to provide you with an integrated solution to your problems.
AfCThis isn't about whether or not you outsource to IBM; this is about the simple fact YOU are the ones who have to deal with the complexity of your system
AfCand very simply, the more interconnections you have, the worse off you are going to be.
AfCI'm going to jump to another set of slides for a minute.
AfCBecause I want to show you something:
AfCPlease load up:
AfChttp://www.operationaldynamics.com/reference/talks/SurvivingChange/img16.html
AfC[I'm watching the web log, so I know who is paying attention :)]
AfC[Come on, yes, this means you. Load that link :)]
AfCNow follow the slides until you get to img24, that is
AfChttp://www.operationaldynamics.com/reference/talks/SurvivingChange/img24.html
AfCThose pictures are a simple visual depiction of the effect of change. You add one more node to a system, and the resultant change you have to deal with IS NOT LINEAR.
AfCThe problem is that the resources you have to deal with are even in the best case scenario, ONLY linearly growing... which means
AfCthat there is always a gap between available resources and the complexity of the system you have to manage.
AfCSo if you want to be a happy Unix person, you need to approach things in new ways.
AfCBack to today's slides:
AfChttp://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img2.html
AfCthe sort of complexity evident in this diagram is common in Unix / Linux production platforms, and yet managing change to that platform is hellishly difficult - with the result that mistakes are made, which result in down time, increased costs, lost revenue, etc.
AfCAnd yet, the vendors of individual products are all happily adding features which supposedly help you, but end up simply sending you further down the direction of having an overly complex, unmanageable, unmaintainable system.
AfCSo, before I go on,
AfCCreo que voy a tomar una pequeña pausa aquí para que los que están
AfCtraduciendo tengan una oportunidad de llegar hasta este punto.
AfC Espero que
AfC todos me estén siguiendo sin problemas.
AfC
AfC  
AfC[[[ next slide: http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img2.html ]]]
AfCoops,cut and paste error. Slide 3
AfC[[[ http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img3.html ]]]
AfC*there* :)
AfC[I told you these slides were mostly eye - candy, but hey, it's all fun]
AfCSo I want to talk now about a strange trend that I've observed in industry over the last few years: mass simplification
AfCI've been talking about the "typical" e-commerce platform. 3-tier architecture of web servers, application servers, and database engine
AfCwhich is what is underneath any big .com site (eBay, amazon.com, etc) and likewise underneath any big enterprise system (like the financial system in a large company, a bank's web site, etc).
AfCThere is no problem with this. It works fine. It is a relatively well understood environment
AfCbut it is bloody COMPLEX
AfCand bloody EXPENSIVE
AfCSo I want to mention 3 examples of people doing things a little differently.
AfCas an illustration of thinking in creative ways about how to manage complexity, and how to manage change
AfC[the trend is that people ARE thinking creatively. Wow!]
AfCAt Yahoo!,
AfCthey have three tiers,
AfCbut it's all ... shifted ... up one layer.
AfCWhen you hit yahoo, you actually hit a squid  reverse-proxy web cache
AfCthey have a huge farm of them.
AfCall their content is properly fine tuned with appropriate expires, last-modified, cache settings
AfCso most of their content quickly gets up to the squid caches and is served from there. No need to hit the "real" web server....
AfC(which means that the squid boxes are, in effect, acting as the web servers)
AfCFurther, even for thier most volitile content, their squid boxes are configured not to bother going back to the web servers for 5 seconds.
AfCwhich further reduces their load.
AfCThe next strange thing they did is that the web servers ARE the application servers. The web servers run PHP, and that's where their application layer is. They don't bother (at least in many of their applications) having a middle tier of app servers, which means they don't have to deal with the network complexity of trying to load
AfCbalance and manage connections between web servers and app servers
AfC(smart move - wiped out an entire layer of complexity... and one which is notoriously hard to analyze and debug)
AfCfinally, they use MySQL in replication mode - 10s of database servers that are all just  read only slaves; there's one write master server somewhere, but MySQL can and does handle all that transparently without too much trouble
AfCSo how's that compared to a Websphere + DB2 + huge massive IBM server, or Weblogic + Oracle + huge Sun server?
AfCit's low tech, but it has less complexity; is easier to manage and tune; and we all know Yahoo has high performance.
AfCSo that's an example of doing something architecturally that can radically change the Unix admin challenge; but here's one better. Google.
AfCNow, admittedly, they have more servers than there are people in China
AfCbut their approach to web server maintenance is brilliant.
AfCDo you know what Google does if one web server dies?
AfCNOTHING.
AfCDo you know what Google does if 5 web servers in a rack are dead?
AfCNOTHING.
AfCThe basically wait until an *entire* cabinet of machines have failed, then unplug their uplink and SAN link, and get a cart and wheel it back to the shop to be mass fixed / reprovisioned / whatever.
AfCThat means that they can maintain servers on an assembly line scale.
AfCAnd that means less effort per failure - and more importantly - fixing things during the day shift, not at 3am.
AfCFinal quick trend: Akamai - a global caching company. They have > 10,000 servers in ISPs spread around the world. In this case, their admin problem was maintaining servers in datacenters they don't control and never see. So the machines are simple, they have *everything* in the kernel (web cache server, ET-phone-home control logic, etc); it boots off of flash. And they have multiple nodes per location; if one fails, no big deal.
AfCTHAT is the trend in Unix infrastructure management that is actually the heart of my talk:
AfC"If it fails, no big deal"
AfCWhich is pretty impressive.
AfCOf course it is possible to over simplify...
AfC[[[ http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img4.html ]]]
AfCOk. Onto the meat:
AfC[[[ http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img5.html ]]]
AfCI've been going on about complexity, and its cost.
AfCThe significant trend in the broader Unix industry isn't a trend in the "young Japanese women like shiny new cell phones" sense, but rather is people trying to deal with challenges that, frankly, vendors don't give them help with.
AfCWhat is really impressive is that faced with the task of trying to manage 10s or 100s of servers in a datacenter, university department, or business office, people quickly realized that "doing things by hand" wasn't good enough.
AfCThere are a number of problems:
AfC* need to add systems (horizontal scaling) easily
AfC* need to be able to deploy systems in the first place (roll out a 10,000 workstation trading floor, anyone?)
AfC* need to be able to replace the entire infrastructure somewhere else in the event of disaster.
AfCSo people started figuring out ways to automate systems administration.
AfCOne slide I forgot to put in was a reference to this chart, so I'll just link it directly:
AfC[[[ http://www.infrastructures.org/papers/turing/images/t7a_automation_curve.png ]]]
AfCThis graphic acknowledges that implementing automation techniques does have an initial cost, but that you quickly break even in terms of staff time.
AfC[[[ back to me, http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img5.html ]]]
AfCthe key learning in the Unix infrastructure world has been to view an organization's machines not as a group of individual nodes, but rather to deal with the system as a single entity - a virtual computer, if you will.
AfCEveryone here has written little scripts to automate basic tasks, right?
AfCHow many of you have written automation tools which take care of deploying, configuring, and maintaining an entire group of machines *over time*?
AfCWell, it turns out, it's a tricky problem.
AfCIn fact, it's incredibly difficult.
AfC[[[ next slide http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img6.html ]]]
AfCAnd that's another trend I want to mention.
AfCThis time it's a negative trend:
AfCmost people attempt to cook up their own ad-hoc solution to infrastructure and configuration management.
AfCwhich is a terrible idea.
AfCPeople have been at this for a while, people with huge datacenters, large research university campuses, investment bank trading floors,
AfCand it turns out there are some unexpected traps.
AfCMost of you will have heard of cfengine, by Mark Burgess
AfCcfengine follows a strategy of "convergence" - it attempts to converge a systems config files to a desired end state.
AfCthe problem is that most of the time, it is not deterministic.
AfCAlva Couch of Tufts University in the US, a computer science professor, showed this in a mathematical proof
AfCwhich means that, unless you know what pitfalls to avoid, if you use cfengine you will, by definition, not know what state your system will end up in.
AfC(which is no help)
AfCThe other major strategy is "congruence", which is when configuration files are generated, typically using some kind of macro substitution. The major problem that occurs here
AfCis around specificity - let's say you have an Apache config for a normal web server, and an Apache config for a secure web serer, and you install them both on the same machine. What happens?
AfC[isconf, psgconf follow these sorts of strategies]
AfCwhich brings us to a generic problem in any configuration management system - how *do* you specify the configuration?
AfCyou need some language to express the configuration, and you want to be able to specify different configurations for different classes of machines (mail servers, web servers, dns machines, etc)
AfCIt's a tough problem. Many people have tried and stumbled :)
AfCThere is a long standing body of research on this topic; there has been an impressive amount of research published  the LISA conferences over the last few years. The conference proceedings are available online at usenix.org (see my slide for a link)
AfCSee also two other primary resources: Mark Burgess (now a full professor of this stuff! Not bad for having started an open source project) maintains a lot of excellent material at the cfengine website,
AfCand I would point you to the infrastructures.org website.
AfCIt's not definitive word by any stretch
AfC(it's based on an 1998 paper at LISA)
AfCbut it represents the knowledge and best practice of some of the leaders in the field globally.
AfCand its a good place to learn that there's more to infrastructure management than meets the eye.
AfCWhich is the second last trend I want to point out - that there IS  a body of knowledge out there about this and several active global open source communities.
AfCI am frequently disappointed when I hear someone in a Linux User Group say "oh, you can manage that with a for loop and ssh", or hear a vendor say "oh, all you need is Jumpstart and everything will be taken care of"
AfCbut we're the ones who run the systems.... and it's our opportunity to do better.
AfC[[[ almost done! http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img7.html ]]]
AfCThe last trend I want to suggest is really just some fortune telling, but I see it coming and so will make a prediction:
AfCgrid computing will change the nature of systems administration.
AfC[was just answer a few questions]
AfCSo here's something happening in the real world, today, at a major pharmaceutical company:
AfCthey have two 512 node cluster super computers (California and Geneva) and offices in California, Texas, and Geneva each with 1000 desktops or so.
AfCthey *globally load balance*
AfCjobs
AfCand, when it's night in California and Texas, all the desktops there get annexed as processing nodes for the cluster in Geneva
AfCAnd it all Just Works (tm).
AfCNow, that's a clustering example, but it sets the tone. Computers aren't individual machines. Collectively, they represent "computing power", and given that you've paid for you might as well leverage them. Cost factors are driving people to do so.
AfCThe thing is that when all the machines in an department/organization/company/university/whatever start being able to be part of a virutal super computer on demand,
AfCthen suddenly the problem is no longer "I need 10 web servers", but "I need computing power with the following characteristics to fulfill the following web serving task"
AfCIt's a matter of policy definition.
AfCEven just in a data center or e-commerce platform, why should I say "this is web01, this web02, ... this is web26"
AfCI really want to say:
AfC"make sure there are sufficient web servers always online to handle normal traffic. If a surge is detected, bring up more web server instances on other machines. Oh, and bring up another database replica as well."
AfCand, most of all, I want it to just figure that out by itself.
AfCDo you see the fundamental shift in the nature of systems administration? It's still Unix; someone still has to configure Apache to work within whatever framework; someone still has
AfCto get the tools in place, but the management of the infrastructure *as a whole* becomes radically different.
AfCI think it's an amazing prospect. I think it will lead us to spending much less time mucking about with systems.
AfCBut, I am wary of one thing: the more abstracted a system becomes, the harder it will be to figure out what it is doing, and why.
AfCOf course, humans are already like that.
AfCI guess that's progress.
AfC[[[ http://www.operationaldynamics.com/reference/talks/TrendsUnixInfrastructure/img8.html ]]]
AfCThanks for listening!
AfCQuestions to #qc, I'll answer here.
AfCEspero que les haya gustado la presentación.
AfC  No duden en ponerse en contacto
AfC conmigo si tienen alguna pregunta.  
AfC-
AfC Os deseo unas buenas noches, y espero que tengan unas felices fiestas.
AfC-
AfC Por mi parte, pues, es una mañana preciosa aqui, y creo que me iré a la playa.
AfC[Oh, re the grid stuff, the SmartFrog link I mentioned is going in these lines already]
fernand0Nice talk!
fernand0Thanks!
fernand0plas plas plas plas plas plas plas plas plas plas plas
fernand0plas plas plas plas plas plas plas plas plas plas plas
AfC:)
fernand0plas plas plas plas plas plas plas plas plas plas plas
feistelplas plas plas plas plas plas plas plas
feistelplas plas plas plas plas plas plas plas
feistelplas plas plas plas plas plas plas plas
feistelplas plas plas plas plas plas plas plas
roelnice, thanks
AfCOne thing from the question channel:
AfC I would also observe that "recover my file that I accidentally deleted", and "oops, we lost the data center, now what?" are fundamentally different problems. Dealing with it is partly figuring out the separation of code/config/binaries (all replaceable) and data (irreplaceable)
fernand0plas plas plas plas plas plas plas plas plas plas plas
AfCall this configuration management and infrastructure magic is about the code part. You still have to work out an intelligent strategy for protecting your data. That's still hard. I guess that's the last trend :)
AfCAny other questions?
AfCComments?
AfCCome on feistel - I know you can't resist :)
rielinteresting talk
fernand0Well
fernand0thanks for your talk
fernand0we'll put the logs in our web as soon as possible
rielthank you AfC
AfCriel: thanks for that pointer.
truluxcla clap clap

Generated by irclog2html.pl by Jeff Waugh - find it at freshmeat.net!

The Organizing Comittee

Email UsMore information


© 2004 - www.uninet.edu - Contact Organizing Comittee - Valid XHTML - Valid CSS - Based on a Design by Raul Pérez Justicia