Wednesday, April 20, 2011

Artisan Server Crafting - A Dying Approach


Artisan server crafting is how many system administrators once approached the work of provisioning and caring for servers. Hardware would arrive on site. Engineers would unpack the new servers and marvel at their new toys; the latest in telescoping rail systems, collapsible cable management arms, and wicked cool blue LEDs. The servers would be lovingly racked and cabled, then something odd would happen. When reaching for the Brother P-Touch label maker, work would grind to a halt; “What are we going to name these servers?” someone would ask. A series of (sometimes heated) conversations ensued while potential host names were debated...

“This one's going to be an SMTP relay so we should name it Cliff.”

“No! That wouldn't be in line with our convention of naming all servers after wizards.”

Eventually, agreement would be reached and the servers could finally be assigned names. This critical step allowed the rest of the work to begin; OS installation, configuration, networking setup, etc... Depending on the nature of the server's intended use, special care would be given to choosing the size of file systems, exactly which services would or wouldn't be installed. Favorite tools would be downloaded and installed. User accounts would be created and the administrators would copy in their preferred local environment variables. Overall, great care would be exercised over each step of the server's creation.

Finally, the servers would be put into service and they would continue to be anthropomorphized over time...

“Oh no! Greyhawk is down to 10hp! We need to delete some old logs to free up space.”

“Gandalf is alive! We saved him from the brink of death after replacing his failing power supply.”

“Hey everybody, Merlin has been up for 500 days in a row! How should we celebrate?”

Servers would be cared for, patched, monitored, and generally looked after by loving administrators. The servers would develop their own personalities, quirks, and other endearing (or maddening) qualities. Eventually, everyone on the team would memorize the little quirks of each server and be able to care for each host for years to come.

Eventually, a few things happened. Distributed applications designed for horizontal scalability helped drive explosive server growth. Business needs drove increasing demand for faster and faster provisioning. Budget constraints limited the server-to-admin ratio. Increased turnover exposed the undocumented state of many environments. Imagine the situation facing a typical administrator: “Welcome aboard! We need 30 new web servers deployed this weekend.” Followed by; “And we need to change the NTP settings on 200 application servers in the next two days.” Followed by; “Bob just turned in his two week notice, so try to schedule some knowledge transfer sessions before he leaves.”

At the same time; server hardware became increasingly commoditized and less expensive. Server virtualization became more widely accessible and less expensive. Provisioning tools became more sophisticated and less expensive. Right when the administrators needed it most, solutions were becoming available. Infrastructure automation became a real possibility! Actually, infrastructure automation became a real necessity!

Unfortunately, some shops still aren't getting it. They cling to hand-crafted artisan ways of the past. They blame the poor planning of project teams that don't provide enough time for server provisioning. They bemoan the budget constraints that keep them from hiring enough people to manage the hundreds or thousands of servers they're responsible for operating. What these shops need to consider is abandoning their server friends and embracing the notion of disposable nodes.

Now, what am I going to use for the SSID of my wireless LAN? Hmmm, let me think about that for a while... :-)

Saturday, March 12, 2011

Technology Aligned Provisioning PLUS Platform Aligned Management?


In my previous post, I discussed the common approaches of using technology aligned teams versus product aligned teams. Both approaches have their pros and cons, but what if there was an approach that combined the best of both worlds?

Most of the pros related to technology aligned teams are manifested during the design, purchasing, and initial provisioning stage. Most of the cons surface after the platform is in production and on-going maintenance or business-driven changes are necessary. On the other hand, platform aligned teams usually excel when responding to evolving business needs, but come up short (from an enterprise perspective) when it comes to procurement and enterprise-aware design decisions. You probably see where I'm headed...

Let's have the technology aligned teams focus on their strengths: centralized purchasing power, enterprise-aware (but not constrained) design, and initial provisioning. After the platform is moved into production, it's turned over to a platform focused team that provides the on-going care and feeding, business-driven changes, and future upgrades.

In this model, it's likely an organization could reduce the size of the technology aligned teams over time. Instead of the two or three go-to people and the surrounding extra hands, leverage the knowledge of the strong people and eliminate the need for a large team for on-going maintenance. Let the platform specific teams handle the on-going maintenance using change schedules that suit their specific business needs.

The cons? This approach requires a high percentage of skilled people. The technology focused teams need to be really good at what they do. Their team can't be composed of one or two good people surrounded by extra hands. They need to create frameworks capable of being turned over to other teams and they need to be capable of listening and adjusting to the sometimes necessary differences required by different product platforms. Similarly, the platform focused teams inheriting the infrastructure need to have the maturity and ability to manage the platform without re-inventing the whole thing after they get their hands on it. The platform teams will likely need to be composed of renaissance engineers in order to allow a relatively small team to manage the wide variety of components necessary to deliver an IT product.

In addition to the strong people, a high degree of trust and communication between the teams is a absolute must. If the teams do not trust and respect each other, this model crumbles. As teams and organizations expand in size, trust and communication are extremely difficult to maintain. Have you seen this model in action? Which aspects actually worked well and which stumbled?

Sunday, February 13, 2011

Technology Aligned Teams vs. Platform Aligned Teams


As a consultant, I had the opportunity to work within a wide variety of organizations. From start-ups to Fortune 100; from banking to transportation. It was interesting to see the wide variety of organizational alignments devised by different companies in different industries. Despite the endless variety and nuance, there seemed to be a couple of common themes across companies despite the specific industry in which they operated.


Technology Aligned Teams

Large IT organizations are often divided into teams based on the technology they are responsible for supporting. A networking team manages switches and routers, a desktop team takes care of workstations, a storage team manages large disk arrays, SANs, and NAS, and so on.

Such an alignment used to make sense. The technologies being deployed were often new and poorly documented, products were complicated to setup and required a significant investment in technical knowledge, and diagnosing issues was almost a black art for some areas. This situation suggests the need to hire engineers who are narrowly focused on very specific technologies. And that's what many companies did. (And still do.)

The benefits and drawbacks to this approach are well known. Benefits are supposed to include cross-training, a deep bench, best-of-breed practices, and an enterprise-level perspective on architecture, procurement, and issue resolution. Drawbacks include IT departments requiring architecture review boards, one-size-fits-all configurations and change management windows that may not suit the needs of individual business unit. Implementations for fast moving products end up waiting in line behind slower business units, and unintentional dependencies emerge across logically separate product lines. For example; load balancers that are shared across multiple product lines. One product needs a load balancer change implemented, but must wait for the weekly change window; otherwise, other products could be negatively impacted by the 'unscheduled' change.

In reality, the cross-training seldom happens. Each team has one or two go-to people surrounded by a bunch of extra hands. The idea of having multiple network engineers is great, but when things are in the ditch, most companies grab the go-to person to save their hide. Over time, teams become increasingly focused on the technology they manage instead of the business products the technology is supposed to be enabling.


Platform Aligned Teams

An alternative approach in some organizations is to create mini-IT departments assigned to a specific product or business unit. The team might include a server and database administrator, a networking engineer, a web server expert, and other skills needed to support a particular platform or business unit. Many times, the people have multi-faceted skills like the combination UNIX/firewall/load-balancer/apache/networking administrator or the DBA/performance tuner/SAN/backup engineer.

Again, the benefits and drawbacks to this approach are also well known. Because this mini-IT group is often allowed to operate with more latitude compared to a centralized IT department, they are better positioned to respond to the needs of the business (read: customer).

While this approach has many benefits at a product or business unit level, analyzing the enterprise wide cost of such an approach can paint a costly picture. Different groups may select different vendors, economies of scale such as price negotiation are diminished, and not all mini-IT groups have the same level of skill.* Risk mitigation may require more focus to avoid the scenario where configurations introduced by one group have a very detrimental effect across an entire company. For example; not properly maintaining a strong security posture or disrupting a network routing algorithm.

* In the past, it was very difficult to find that magic collection of engineers that the wide variety of experience necessary to build a strong mini-IT group. The maturation of most products and the increasing number of renaissance engineers (engineers with skills across a wide variety of disciplines) has definitely improved the viability of this approach.


Summary

Between the two approaches described so far, my experience would suggest platform aligned teams are more practical and deliver better results for customers. I think this is especially true in a turnaround scenario (rescuing a failing division or subsidiary) as well as business units focused on growth or new markets. Products and business teams that exist in a saturated or slowly shrinking market might be okay with the traditional technology aligned approach -assuming the goal is to simply ride out a cash cow or avoid pursuit of significant strategic improvements in order to prepare for a sell-off.

Next time: What if there was an approach that combined the best of both worlds?


Sunday, January 16, 2011

Implementation is More Important Than Product Selection (Part 2 of 2)


In my previous post, I described a common scenario an organization experiences when acquiring a new IT component. It's easy to point out flaws and criticize companies when you're a single actor in the equation, but the fact is, no single person possess all information related to all of the competing business drivers. I've heard plenty of self-righteous water cooler conversations from smart people who describe how things should have been done. Their theories are good and certainly suggest a better outcome than the most recent experience, but they usually lack pragmatism.

Over the course of my career, I have fulfilled all of the roles described in the earlier procurement scenario. I was the IT engineer asked to sit with the vendor to watch everything they did and learn over their shoulder -only to have my manager redeploy me to a new fire. I've had to be the manager responsible for pulling someone off such an engagement because they need to address an urgent issue elsewhere within the company. I've been the professional services expert that's hired to come in, setup the component, configure it for operation, and cross-train the existing staff. I've taken people on the golf trips and been taken on the golf trips. I'm also a purchasing decision maker responsible for the budget and the director that's pounding on the table because “we need it in production now!”

Participating in this scenario from so many different angles has taught me a lot. A couple of things that come to mind here include...


Make key decisions faster

Most IT components have matured to a point that they are capable of supporting all of the use case scenarios likely to be encountered by a typical company. Obviously, there are exceptions, but whether it's a firewall, switch, server, storage array, or other device; most companies will do fine selecting from any of the top three vendors in any category.

Not even a perfect product selection will succeed if the underlying architecture is flawed or your people don't know how to manage it.

The point is; organizations can invest too much time equivocating on which vendor to select. There's a point after which additional research, discussions, tests, and other activities simply do not improve the quality of a decision.


Allocate time where it will have the biggest impact.

In the previous example, a new component was evaluated, selected, and installed over the course of six months. The team invested five months in product selection and a week or two in implementation. The motivation is obvious; “If we're going to spend blah dollars on this thing, we want to get it right.” Again, most organizations invest too much time trying to select the perfect product for their needs when any of the top three vendors will do fine.

Instead of investing 90% of the time on selection and acquisition only to leave 10% for implementation, suggest spending at least as much time on implementation as you did on selection. In the previous example, suggest investing two months on product selection, acquire it, and move forward. Then spend two to three months on configuration and testing. Let all of the engineers spend time with the product and truly learn it. Let them learn how to build it up from scratch by themselves. Force it to break. Understand what can make it break and how to repair it. If it's hardware, actually go through the hot-swap procedures of all sub systems. Make sure your team knows this thing inside and out since they'll be supporting it once it goes into production.

Support your team, and your organization, by allocating and protecting the time necessary to truly learn, correctly configure, and confidently deploy new technologies. These aspects of deployment are far more critical than product selection and will directly impact an organization's ability to effectively leverage the technology for years to come.

Sunday, December 19, 2010

Implementation is More Important Than Product Selection (Part 1 of 2)


Far too often, I see IT professionals invest a significant amount of time evaluating products for a specific solution. Vendors are drilled on the capabilities of their products. Engineers fire off questions they hope will make them look smart in front of their bosses and point out holes in the vendors' solution; “So, you're saying your firewall appliance doesn't support an enterprise wide CORBA bus!” Never mind the engineer knows perfectly well they're company doesn't have a CORBA bus, doesn't plan to use such an architecture, and nobody else does either. Demo units are brought in, tests are haphazardly conducted, the field is narrowed to top industry leaders, fiscal year-end concessions are made by a vendor, someone goes on a golf trip, and finally; a product is selected.

After five months of meetings and evaluations, the money has been spent. Management wants the shiny new product in use immediately. “We didn't spend blah dollars on this solution to have it sit around on a shelf. We need it in production now!” Because the local folks aren't experts yet, the vendor supplies a professional services contractor (probably a sub contractor from one of their resellers) to “install, configure, and perform knowledge transfer” with the team.

The PS guy shows up and is guided to the conference room that's been reserved for the whole week so the team can watch and learn how to perform installation, configure the device, and receive knowledge transfer. Four hours into day one, the local folks are pulled off to fight some IT fire. The PS guy forges ahead, begs for business requirements, completes the work as best he can, writes up some documentation, and flies out early on Friday.

The implementation is finished.

At this point, the company relies heavily on a piece of equipment their own people don't understand very well. The device is in the critical path of business operations; therefore, it's “hands off!” Changes are approached timidly with fingers crossed instead of confidence. Over time, the poor understanding leads to unintentionally poor configurations. The device begins to act unexpectedly. It begins to develop a negative reputation because it's “unstable” or “unpredictable” or “doesn't support feature x”.

Of course, nobody suggests it's being managed poorly -that the people don't know what they're doing. That might hurt somebody's feelings. Besides, they know they have good people. The folks have a good attitude, work long hours, and put out a lot of fires. “That piece of junk is causing us all kinds of headaches. It needs to be replaced!”

And the whole cycle starts over again...

Next Post: Is there a better way?

Sunday, November 28, 2010

DevOpsATL Meetup Group Launched!


It's been great working within the Atlanta IT community. I've had the pleasure of working with some very talented and professional colleagues. Recently, I had the opportunity to reconnect with an old friend. This chance meeting eventually lead to the creation of the DevOpsATL meet-up group...


Earlier this year, I signed up to attend the 2010 O'Reilly Velocity Web Performance and Operations Conference. As the event drew closer, I checked the conference sign-up list to see if anyone I happened to know would be going. To my surprise, there was an old name I hadn't seen in years. While catching up at the conference, he introduced another friend of his and the three of us started discussing the idea of creating a meet-up group to serve the Atlanta community. A few weeks after the conference, everyone actually followed through and created the DevOpsATL group on meetup.com.


All of the DevOpsATL organizers are great to work with and have a lot of practical experience to offer. We've had a couple of meet-ups already and I am quite impressed at the quality of people who've decided to join the conversation. We're really looking forward to giving back to the Atlanta IT community, so if you're in the area during one of our meet-up dates, stop in for some compelling conversations.


Friday, July 23, 2010

(too) strong passwords = less security


There are many effective ways to secure access to computer systems, but I suspect most users' access to most computer systems is still largely controlled by a password. With all of the great alternatives to single passwords out there, I wouldn't defend this fact as acceptable; rather, it's simply very common. And of course, if a user is required to utilize a password, it may be written down somewhere. Again, not acceptable, but quite common.

Not me! I'm smart. I work in IT. I've secured networks and servers myself. I would never write down my password!

So, information security professionals insist passwords should meet certain strength requirements so users don't use "password" for their password. Okay, strength requirements seem like a rational requirement: minimum number of 8 characters, includes a mix of alphas and numbers, and must be changed every 90 days. Their boss says; "Good work. It meets our audit requirements. Keep it up."

As an IT professional who generally supports efforts to secure our assets, I go along. I dutifully dream up a password and remember it without writing it down. Then, I log into the 26 different systems that authenticate me separately and set my new password. (Yes, it would be more secure to use 26 separate passwords, but I'll save my rants on the history (pipe dream) of centralized authentication and single sign on for another day))

So, those information security people figure they should make the password requirements even stronger: minimum number of 8 characters, includes a mix of at least 2 lowercase alphas, 2 uppercase alphas, 2 numbers, and 2 special characters, and must be changed every 60 days. Their boss says; "You guys are great! I can tell the auditors we're even strongerly protected!"

Groan. Okay. I dream up an algorithm to use that fits the criteria and begin setting my 26 accounts. But, three of the systems don't support special characters. Doh! Now I'm forced to do password forking. I adjust my algorithm to account for the lack of special char support and know what to substitute when I hit those systems.

So, those information security jerks figure they should make it even stronger so they can justify their job; minimum number of 10 characters, includes a mix of at least 2 lowercase alphas, 2 uppercase alphas, 2 numbers, and 2 special characters, must be changed every 30 days, and no character can be re-used in the same position for 12 months (ie: AppL#14! cannot replace 0p&n23T# since a "p" is re-used in the second position). Their boss says; "You guys are totally rad! I bet I can score a date with that cute auditor with security hotness like this!"

Groan. My algorithms are shot. I find four systems that can't support passwords over 8 chars. Of course they only partially overlap with those that don't support special chars. Now I need 4 separate passwords every 30 days.

Then, those information security jerks disallow any repeating chars (like the "pp" in AppL#14!). The interval is reduced to 28 days, so my first of the month calendar reminder no longer works. The password event starts arriving earlier each month. Their boss says; "We love you! Which certification would you like the company to pay for you to pursue?"

I find systems that allow over 8 chars, but only enforce rules on the first 8 so a special char in position 9 isn't recognized. I invest hours writing expect scripts to automate this monthly hell, but some systems are difficult to script against java GUIs -not impossible, just not worth the time.

So, they do some more stupid stuff. Their boss says; "You're promoted!"

I give up. The Post-It note goes in my wallet.

I suspect there are others that share my frustration. I'm willing to support security efforts, but at some point, they become completely impractical and, ultimately, weaken the overall security posture of the assets they are trying to protect.

Have you experienced anything similar?