Wednesday, April 20, 2011

Artisan Server Crafting - A Dying Approach


Artisan server crafting is how many system administrators once approached the work of provisioning and caring for servers. Hardware would arrive on site. Engineers would unpack the new servers and marvel at their new toys; the latest in telescoping rail systems, collapsible cable management arms, and wicked cool blue LEDs. The servers would be lovingly racked and cabled, then something odd would happen. When reaching for the Brother P-Touch label maker, work would grind to a halt; “What are we going to name these servers?” someone would ask. A series of (sometimes heated) conversations ensued while potential host names were debated...

“This one's going to be an SMTP relay so we should name it Cliff.”

“No! That wouldn't be in line with our convention of naming all servers after wizards.”

Eventually, agreement would be reached and the servers could finally be assigned names. This critical step allowed the rest of the work to begin; OS installation, configuration, networking setup, etc... Depending on the nature of the server's intended use, special care would be given to choosing the size of file systems, exactly which services would or wouldn't be installed. Favorite tools would be downloaded and installed. User accounts would be created and the administrators would copy in their preferred local environment variables. Overall, great care would be exercised over each step of the server's creation.

Finally, the servers would be put into service and they would continue to be anthropomorphized over time...

“Oh no! Greyhawk is down to 10hp! We need to delete some old logs to free up space.”

“Gandalf is alive! We saved him from the brink of death after replacing his failing power supply.”

“Hey everybody, Merlin has been up for 500 days in a row! How should we celebrate?”

Servers would be cared for, patched, monitored, and generally looked after by loving administrators. The servers would develop their own personalities, quirks, and other endearing (or maddening) qualities. Eventually, everyone on the team would memorize the little quirks of each server and be able to care for each host for years to come.

Eventually, a few things happened. Distributed applications designed for horizontal scalability helped drive explosive server growth. Business needs drove increasing demand for faster and faster provisioning. Budget constraints limited the server-to-admin ratio. Increased turnover exposed the undocumented state of many environments. Imagine the situation facing a typical administrator: “Welcome aboard! We need 30 new web servers deployed this weekend.” Followed by; “And we need to change the NTP settings on 200 application servers in the next two days.” Followed by; “Bob just turned in his two week notice, so try to schedule some knowledge transfer sessions before he leaves.”

At the same time; server hardware became increasingly commoditized and less expensive. Server virtualization became more widely accessible and less expensive. Provisioning tools became more sophisticated and less expensive. Right when the administrators needed it most, solutions were becoming available. Infrastructure automation became a real possibility! Actually, infrastructure automation became a real necessity!

Unfortunately, some shops still aren't getting it. They cling to hand-crafted artisan ways of the past. They blame the poor planning of project teams that don't provide enough time for server provisioning. They bemoan the budget constraints that keep them from hiring enough people to manage the hundreds or thousands of servers they're responsible for operating. What these shops need to consider is abandoning their server friends and embracing the notion of disposable nodes.

Now, what am I going to use for the SSID of my wireless LAN? Hmmm, let me think about that for a while... :-)

Saturday, March 12, 2011

Technology Aligned Provisioning PLUS Platform Aligned Management?


In my previous post, I discussed the common approaches of using technology aligned teams versus product aligned teams. Both approaches have their pros and cons, but what if there was an approach that combined the best of both worlds?

Most of the pros related to technology aligned teams are manifested during the design, purchasing, and initial provisioning stage. Most of the cons surface after the platform is in production and on-going maintenance or business-driven changes are necessary. On the other hand, platform aligned teams usually excel when responding to evolving business needs, but come up short (from an enterprise perspective) when it comes to procurement and enterprise-aware design decisions. You probably see where I'm headed...

Let's have the technology aligned teams focus on their strengths: centralized purchasing power, enterprise-aware (but not constrained) design, and initial provisioning. After the platform is moved into production, it's turned over to a platform focused team that provides the on-going care and feeding, business-driven changes, and future upgrades.

In this model, it's likely an organization could reduce the size of the technology aligned teams over time. Instead of the two or three go-to people and the surrounding extra hands, leverage the knowledge of the strong people and eliminate the need for a large team for on-going maintenance. Let the platform specific teams handle the on-going maintenance using change schedules that suit their specific business needs.

The cons? This approach requires a high percentage of skilled people. The technology focused teams need to be really good at what they do. Their team can't be composed of one or two good people surrounded by extra hands. They need to create frameworks capable of being turned over to other teams and they need to be capable of listening and adjusting to the sometimes necessary differences required by different product platforms. Similarly, the platform focused teams inheriting the infrastructure need to have the maturity and ability to manage the platform without re-inventing the whole thing after they get their hands on it. The platform teams will likely need to be composed of renaissance engineers in order to allow a relatively small team to manage the wide variety of components necessary to deliver an IT product.

In addition to the strong people, a high degree of trust and communication between the teams is a absolute must. If the teams do not trust and respect each other, this model crumbles. As teams and organizations expand in size, trust and communication are extremely difficult to maintain. Have you seen this model in action? Which aspects actually worked well and which stumbled?

Sunday, February 13, 2011

Technology Aligned Teams vs. Platform Aligned Teams


As a consultant, I had the opportunity to work within a wide variety of organizations. From start-ups to Fortune 100; from banking to transportation. It was interesting to see the wide variety of organizational alignments devised by different companies in different industries. Despite the endless variety and nuance, there seemed to be a couple of common themes across companies despite the specific industry in which they operated.


Technology Aligned Teams

Large IT organizations are often divided into teams based on the technology they are responsible for supporting. A networking team manages switches and routers, a desktop team takes care of workstations, a storage team manages large disk arrays, SANs, and NAS, and so on.

Such an alignment used to make sense. The technologies being deployed were often new and poorly documented, products were complicated to setup and required a significant investment in technical knowledge, and diagnosing issues was almost a black art for some areas. This situation suggests the need to hire engineers who are narrowly focused on very specific technologies. And that's what many companies did. (And still do.)

The benefits and drawbacks to this approach are well known. Benefits are supposed to include cross-training, a deep bench, best-of-breed practices, and an enterprise-level perspective on architecture, procurement, and issue resolution. Drawbacks include IT departments requiring architecture review boards, one-size-fits-all configurations and change management windows that may not suit the needs of individual business unit. Implementations for fast moving products end up waiting in line behind slower business units, and unintentional dependencies emerge across logically separate product lines. For example; load balancers that are shared across multiple product lines. One product needs a load balancer change implemented, but must wait for the weekly change window; otherwise, other products could be negatively impacted by the 'unscheduled' change.

In reality, the cross-training seldom happens. Each team has one or two go-to people surrounded by a bunch of extra hands. The idea of having multiple network engineers is great, but when things are in the ditch, most companies grab the go-to person to save their hide. Over time, teams become increasingly focused on the technology they manage instead of the business products the technology is supposed to be enabling.


Platform Aligned Teams

An alternative approach in some organizations is to create mini-IT departments assigned to a specific product or business unit. The team might include a server and database administrator, a networking engineer, a web server expert, and other skills needed to support a particular platform or business unit. Many times, the people have multi-faceted skills like the combination UNIX/firewall/load-balancer/apache/networking administrator or the DBA/performance tuner/SAN/backup engineer.

Again, the benefits and drawbacks to this approach are also well known. Because this mini-IT group is often allowed to operate with more latitude compared to a centralized IT department, they are better positioned to respond to the needs of the business (read: customer).

While this approach has many benefits at a product or business unit level, analyzing the enterprise wide cost of such an approach can paint a costly picture. Different groups may select different vendors, economies of scale such as price negotiation are diminished, and not all mini-IT groups have the same level of skill.* Risk mitigation may require more focus to avoid the scenario where configurations introduced by one group have a very detrimental effect across an entire company. For example; not properly maintaining a strong security posture or disrupting a network routing algorithm.

* In the past, it was very difficult to find that magic collection of engineers that the wide variety of experience necessary to build a strong mini-IT group. The maturation of most products and the increasing number of renaissance engineers (engineers with skills across a wide variety of disciplines) has definitely improved the viability of this approach.


Summary

Between the two approaches described so far, my experience would suggest platform aligned teams are more practical and deliver better results for customers. I think this is especially true in a turnaround scenario (rescuing a failing division or subsidiary) as well as business units focused on growth or new markets. Products and business teams that exist in a saturated or slowly shrinking market might be okay with the traditional technology aligned approach -assuming the goal is to simply ride out a cash cow or avoid pursuit of significant strategic improvements in order to prepare for a sell-off.

Next time: What if there was an approach that combined the best of both worlds?


Sunday, January 16, 2011

Implementation is More Important Than Product Selection (Part 2 of 2)


In my previous post, I described a common scenario an organization experiences when acquiring a new IT component. It's easy to point out flaws and criticize companies when you're a single actor in the equation, but the fact is, no single person possess all information related to all of the competing business drivers. I've heard plenty of self-righteous water cooler conversations from smart people who describe how things should have been done. Their theories are good and certainly suggest a better outcome than the most recent experience, but they usually lack pragmatism.

Over the course of my career, I have fulfilled all of the roles described in the earlier procurement scenario. I was the IT engineer asked to sit with the vendor to watch everything they did and learn over their shoulder -only to have my manager redeploy me to a new fire. I've had to be the manager responsible for pulling someone off such an engagement because they need to address an urgent issue elsewhere within the company. I've been the professional services expert that's hired to come in, setup the component, configure it for operation, and cross-train the existing staff. I've taken people on the golf trips and been taken on the golf trips. I'm also a purchasing decision maker responsible for the budget and the director that's pounding on the table because “we need it in production now!”

Participating in this scenario from so many different angles has taught me a lot. A couple of things that come to mind here include...


Make key decisions faster

Most IT components have matured to a point that they are capable of supporting all of the use case scenarios likely to be encountered by a typical company. Obviously, there are exceptions, but whether it's a firewall, switch, server, storage array, or other device; most companies will do fine selecting from any of the top three vendors in any category.

Not even a perfect product selection will succeed if the underlying architecture is flawed or your people don't know how to manage it.

The point is; organizations can invest too much time equivocating on which vendor to select. There's a point after which additional research, discussions, tests, and other activities simply do not improve the quality of a decision.


Allocate time where it will have the biggest impact.

In the previous example, a new component was evaluated, selected, and installed over the course of six months. The team invested five months in product selection and a week or two in implementation. The motivation is obvious; “If we're going to spend blah dollars on this thing, we want to get it right.” Again, most organizations invest too much time trying to select the perfect product for their needs when any of the top three vendors will do fine.

Instead of investing 90% of the time on selection and acquisition only to leave 10% for implementation, suggest spending at least as much time on implementation as you did on selection. In the previous example, suggest investing two months on product selection, acquire it, and move forward. Then spend two to three months on configuration and testing. Let all of the engineers spend time with the product and truly learn it. Let them learn how to build it up from scratch by themselves. Force it to break. Understand what can make it break and how to repair it. If it's hardware, actually go through the hot-swap procedures of all sub systems. Make sure your team knows this thing inside and out since they'll be supporting it once it goes into production.

Support your team, and your organization, by allocating and protecting the time necessary to truly learn, correctly configure, and confidently deploy new technologies. These aspects of deployment are far more critical than product selection and will directly impact an organization's ability to effectively leverage the technology for years to come.