Friday, May 14, 2010

Cisco UCS – Part 1

In this post I’m going to take a look at the components of a UCS deployment and to that end I’m going to break this into 3 separate blogs.

  1. The first part will cover hardware components
  2. The second part will cover the Software components and
  3. Finally we’ll look at putting it all together and some of the points for consideration in doing that

In this post I’ll deal with part 1 (Hardware) drawing together all the references I found since I started to look into UCS (middle of 2009). All the information that I’ve gathered from connecting to knowledgeable people in and out of Cisco via various sources (but in particular due to twitter and googlewave) will be drawn together here in this series of blogs.

I’ve put the links to documents, videos and 3-D interactive views that cover the components into this blog. I think that by including those 3 elements (where possible) you will see and understand the components better and their relationship to each other.

Cisco UCS is an example of deploying FCoE at the server edge as mentioned here and is the first step in FCoE spreading to the data center core.

Hardware components of UCS

So starting with the hardware, the components that make up a UCS deployment are (see the diagram below) –

  • CNA (Converged Network Adapters)
  • Blades (B200/B250 M1 blade)
  • Chassis (5108 )
  • Fabric Extender (aka IOM, 2100 series)
  • Fabric Interconnects (6120/6140 models)
  • Expansion Modules (SAN/LAN – For data center connectivity)


Please open the interactive 3-D Model (here) in another window as you go through this section.

CNA (Converged Network Adapters)

In previous posts we mentioned that unified fabric is part of Cisco’s data center 3.0 strategy. One of the core elements of this strategy is the CNA (Converged Network Adapter), this enables previous disparate technologies – FC and LAN – to work on the same physical card in the blade (or server) out to an upstream switch over the same cable (in UCS the upstream switch is the Fabric Interconnect) and operating at 10GigE speeds.

Not only does this allow the fabrics (FC and LAN) to be unified, but they also allow consolidation of a number of 10/100/1000 lan connections into a single 10GigE pipe.

There are 3 types of CNA’s which can be deployed in the blades:-

  • Cisco UCS M81KR dual 10gig virtual interface card

This is very much a next generation adapter because not only does it support Cisco’s tenet of unified fabrics at 10GigE speeds but also it provides a layer of virtualisation of the physical adapter to the blade in which it sits. It has 2*10GigE ports that connect via the chassis backplane to the fabric extenders in the chassis (1 port to each fabric extender) for uplink purposes.

Due to the ability to virtualise the physical hardware card you can provide up to 128 (58 due to Fabric Interconnect restrictions) PCIe virtual interfaces either as a vNIC (58 or 56 if vHBA is used ) or vHBA (2) to the blade. These virtual interfaces can be dynamically configured (this is particularly useful for VMware environments).

So for example if your blade had one of these cards you can present to the bare metal installed OS or to a hypervisor a total of 58 virtual PCIe devices - LAN or SAN HBA’s. This is very useful if you want to have separation of devices at the OS/hypervisor level without the OS or hypervisor ever knowing that the virtual PCIe interfaces are actually from a single physical adapter in the blade. So for example I could present 4 vNICs - 1 for backup, 1 for mgmt and 2 for user access to the bare metal OS without it knowing that in reality it is coming from one physical card in the blade.

Link - here

  • Qlogic/Emulex CNA

These are Converged FCoE network adapters which have 2*10GigE ports from the blade, 1 to each Fabric Extender (IOM) via chassis backplane for uplink. The card presents (virtually) 2*4GB FC ports (vHBA) and 2*10GigE LAN (vNIC) ports down to the OS on the blade.

Emulex M71KR-E Link - here

Qlogic M71KR-Q Link - here

  • Cisco UCS 82598K-CI 10GigE Adapter

This is a Converged FCoE Network adapter which has 2*10GigE ports from the blade, 1 to each Fabric Extender (IOM) via chassis backplane for uplink and presents 2 LAN ports only down to the OS on the blade (Designed for low latency – 2 ports up, 2 ports down)

Link - here

Blade Servers

There are 2 blade servers that are presently available for deployment into a UCS chassis.

  • B200 M1

This is a half-width server that has 2 Nehalem 5500 sockets with 12 DIMMS of DDR3 memory which means a maximum of 96GB. Using any of the CNA’s you have 20GB/s I/O (10GB per port) to the Fabric Extenders in the chassis for uplink. This blade takes 1 dual port CNA. It also supports 2 (optional) SAS drives.

In a blade chassis you could fit 8 of these blades.

Link - here

Product Small Photo

  • B250 M1

This is a full-width server that has 2 Nehalem 5500 sockets with 48 DIMMS of DDR3 memory which means a maximum of 384GB. It supports 2 dual port CNA’s meaning 40GB/s (10GB per port) and 2 optional SAS drives.

It is worth noting that one of the leading features of this blade is the massive increase in memory compared with other blades. Normally standard blades have 9 slots per CPU so other Xeon 5500 have 18 slots (2 CPUs) in total which with 8GB DIMMS is a total of 144GB. It is worth noting that the average memory deployed in most blades today is 48GB, however with increasingly large virtualization projects, the more memory you have the better as VM’s tend to be memory hungry.

What Cisco have done is put in more memory slots (30) making a total of 48 memory slots in the server and these are presented to an ASIC (Application Specific Integrated Circuit) that is between the slots and the memory controller. The ASIC then presents every 4 slots of 8GB memory DIMMS as (1 slot) 32GB DIMM. So with 48 Slots that is (48/4) *32GB DIMMS presented to the memory controller which is 384MB in the blade.

One of the additional benefits of this (aside from potential for more VM’s) is that even if you don’t require 384GB in the server, you can get 192GB by using 4GB DIMMS and thereby achieve higher memory density than most other blades at a lower cost point.

In a blade chassis you can fit 4 of these blades.

Link Information / Video- here

Product Small Photo

Blade Chassis UCS 5108

The blade chassis is (as described by Cisco) a crucial building block, as it houses most of the components in the UCS deployment (blades/CNA and Fabric Extenders), It is a 6U chassis with the ability to house 8 half-width blades (B200 M1) or 4 full-width blades (B250 M1) or a combination of the two blades. It has fewer parts than other blade chassis as the brains/control of the UCS system lies upstream outside of the chassis in the fabric interconnects.

This means it doesn’t take much management and is more energy efficient as the unified fabric (FC and LAN on the same cable) means less cabling and drawing of power for the fewer parts (no chassis switches/modules as in traditional blades chassis). The chassis has 8 fans and 4 power supplies and houses 2 Fabric Extenders and the chassis backplane is 63% open for better airflow.

Link Information / Video - here

Additional Video - here

Product Small Photo

Fabric Extender (2100)

The fabric extender (aka IOM – Input Output Module) is one of the new and innovative elements of Cisco UCS. In a traditional blade chassis you would have interconnect modules for Ethernet, infiniband, SAS or FC these modules allow the chassis to be connected to other devices upstream running that protocol. The fabric extenders sit within the chassis, but where it differs is that it only runs one unified fabric (protocol) – FCoE and they do no switching, unlike traditional chassis interconnect modules. They are an extension of the fabric interconnect (FCoE upstream switch which physical sits outside of the chassis i.e. ToR) to which they are connected and in which all the management will take place for multiple chassis connected to it.

They have been described as a distributed line card – they allow control of the chassis/blades/service profiles to be done from the fabric interconnect. This also means that UCS system scales with very little effort – connect the new chassis (via the fabric extender in the chassis) to the fabric interconnects, acknowledge the new chassis (in UCSM) and then an inventory of the chassis will automatically take place and it is ready to use.

The chassis (5108 ) supports 2 fabric extenders; each fabric extender has 8 internal 10Gig ports (downlink) that connect to each of the 8 blades slots and 4 external 10Gig ports (uplink) that connect up to the Fabric Interconnect. Given that there is more bandwidth in the chassis (16*10GigE) than the uplink bandwidth (8*10GigE) then that brings us to the question of oversubscription, which I’ll cover in part 3 of the blogs concerned with UCS.

Link Information - here

Product Small Photo

Fabric Interconnect (6100)

The fabric Interconnects are also a new and innovative element of UCS. It is the point at which management of the UCS domain occurs, the funnel through which LAN and SAN traffic enters and exits the UCS domain. Unlike other blade designs the management sits in the fabric interconnect which sits outside of the individual chassis allowing the management of multiple chassis in multiple racks, in a ToR (Top Of Rack) design.

As a brief overview/comparison there are 2 models of fabric interconnects the 6120XP and 6140XP. The former is 20 ports, 1U, and 520Gbps throughput with one expansion slot; whilst the later is 40 ports, 2U, 1.04 terabits throughput with two expansion slots. All ports on both models are 10GigE FCoE capable and can be configured as uplink (to core network switches) or server ports (to blade chassis) depending on the required number of chassis to be connected and the required uplink (bandwidth) links.

The fabric interconnects are deployed in pairs and have an out-of-band management port and are connected together via clustered ports. The fabric interconnects and the attached chassis’s form a UCS domain. The Fabric interconnects have 3 mgmt ip’s (1 each and a cluster address). One fabric interconnect is active and the other is passive from a management point of view (the passive one is kept up to date via the cluster ports), however the (server and uplink) ports on both the active and passive fabric interconnects are active to allow the most throughput to the core network or SAN

Link/video - here Additional video - here

Product Small Photo

Expansion Module

The expansion module fits into the fabric interconnect and is the only means by which FC traffic can be broken out to the FC based infrastructure. If you look at the picture showing the fabric interconnects (above) the right hand side houses the module (take a look on the 3-D model). There are 4 types of expansion modules.

  • 8 port 1/2/4-Gbps FC Expansion module
  • 6 port 1/2/4/8-Gbps FC Expansion Module
  • 4 port FC + 4 port 10GigE Expansion module
  • 6 port 10GigE Expansion module

The 6120 fabric interconnect has 1 slot for the expansion module, whilst the 6140 has 2 slots for expansion modules.

Therefore the expansion module can give you flexibility in extending your UCS domain (connecting more chassis) by adding additional downlinks (server) connectivity on each fabric interconnect. It is worth noting that it is only possible to have FC based SAN connectivity via an expansion module. The likelihood is that you will at least want SAN (FC) connectivity to the present (Non FCoE ready) data center core.

In the next post we’ll cover the software element of UCS deployment.

Monday, December 21, 2009

FCoE, what is it?

In my last post (Converged Infrastructure) I mentioned FCoE as one of the elements used to bring convergence to the data center. In this post I decided to explain a little as to what FCoE is and why it has a role in bringing convergence to the data center.

What is FCoE?

FCoE is Fibre Channel Over Ethernet - this is the encapsulating of the fibre Channel frames inside ethernet frames so fibre channel traffic and LAN traffic is sent over the same cables instead of over separate fibre and LAN cabling. Although this technically can be done over 1GigE network, vendors are only providing 10GigE devices. This means a certain amount of disruption (more on this below) to the existing network architecture.

There is also a requirement for what is called DCB - Data Center Bridging as Ethernet is ‘lossy’ (packets will get lost/retransmitted), whilst fibre is ’lossless’ (no packets dropped) as you don’t want to lose/drop data being transmitted to your SAN!!.

In order for FC to run over Ethernet we need to ensure that losslessness of fibre is retained when run over ethernet. So various standards are presently being worked on by the Data Center Bridging group in order to enable a low latency and lossless ethernet network, that allows FCoE frames to be transmitted on the same bits of wire as LAN traffic.

Why is FCoE Useful?

Consider a typical data center as depicted in the diagram below



What FCoE gives you is convergence at the adapter level (CNA - Converged Network Adapter) within the server (a single card for both SAN/LAN connectivity), so a reduction in HBA’s which then requires less power utilised by the server. In addition a reduction in switches as SAN and LAN switches are no longer separate. So the future converged infrastructure will look something like this (not the best diagram, it is simply to show the reduction in switches/cables).



There is potential for a 50% reduction in the data center for switches/cables and less power (Green I.T) required due to the reduction both in terms of physical equipment and the amount of power used in servers (with fewer cards).

The convergence of the physical infrastructure takes many shapes (some mentioned here) including for example server virtualisation and/or storage virtualisation. FCoE is another (complementary) method, the other methods might bring just as much benefit, but together they can bring greater convergence to the data center.

The FCoE standard was adopted in June 2009 and details can be found here, here and here, this last entry is the pdf of standard. There is still work to be done around DCB however and that is mentioned in the second entry just provided.

Many of the leading companies (for example Cisco) have FCoE at the center of their Data Center strategy and so it is critical for them that FCoE is adopted widely. It is the means by which Cisco see the convergence of the data center and it is a core part of their Unified Computing System (UCS).

We mentioned earlier that FCoE/DCB will have some disruption on the data center due to the requirement for 10GigE (it may be that your core network is presently not 10GigE as an example) and DCB standards being adopted meaning new hardware - CNA’s and FCoE switches to take advantage of FCoE/DCB. That is why the full power of FCoE/DCB will be gradual (rather that what some have termed) rip-and-replace strategy. This will start at the Access layer (server edge) and over time move through the Aggregation layer into the Network Core.

If you look at Cisco’s UCS blade infrastructure that is an example of how this can be achieved NOW - in the blade chassis FCoE is run over 10GigE ports (on Fabric Extenders) that connect to Fabric Interconnect (6120 or 6140) allowing Fibre and IP traffic to run over the same cable in the chassis upto the access switch (Fabric Interconnect).

The Fabric Interconnect has 10GigE ports and can also take a module that has Fibre Channel ports. The fibre channel ports then uplink to the normal SAN switches using fibre cables, whilst the Ethernet ports uplink to the Core Switches using lan cables. So you have 10GigE FCoE at the server (chassis) edge over the same cable that branches to separate legacy cables as you approach the network core

I was on the Data Center Of The Future event in which Cisco presented their view on the subject. This included a Q&A session, during the session I asked the question as to whether FCoE was ready and the response was "yes". So then I adjusted my question and asked if I could use it from the server to the core and then I was told that "some standards still needed to be ratified for that to be achieved, but FCoE had been ratified". So there still needs to be some work done before FCoE becomes all embracing from the server edge to the network core in the Data Center.

Nigel Poulton has a really good series of posts (deep dives) on FCoE here if you want to know more. Also Dave Convery has a very good post here on FCoE. There is also a very good post here that details the savings from deploying FCoE for a hospital including space and power.

Saturday, December 19, 2009

Data Center 2.0/3.0

I was attending the DCOF (Data Center Of the Future) web conference over 2 days (15th/16th December). The event had most of the main data center players listed here giving presentations (with the notable exception of HP) on the future data center.

One of the presentations was given by Cisco who throughout their presentation made reference to Data Center 3.0 (if we don't have numbers how do we know where we are? web 2.0, enterprise 2.0). After awhile I had to ask the question

What is the difference between DC 2.0 and DC 3.0 as Cisco talks about it?

The response:-

"In a nutshell, DC 2.0 refers to the client/server model and distributed resources. DC 3.0 refers to initiatives being taken today around consolidation and virtualization of resources. The goal of DC 3.0 is to be able to manage these resources and leverage them as a service to deploy applications alot more efficiently."

So there you have it, the data center of the future is consolidated and virtualization. No more silo's of server, storage and network that are isolated from each other and managed as such. These resources will be a tightly integrated, flexible and in Cisco's eyes virtualized, managed by integrated tools from a single pane of glass, service oriented (as opposed to asset oriented).

After the discussion I began to think that there was another difference between Data Center 2.0 and 3.0, My additional thinking is this

- Data Center 1.0 (the original) was the mainframe era
- Data Center 2.0 was the Unix/proprietary Architecture era
- Data Center 3.0 is the X86/X64 Open Architecture era

I've put it that way because when I originally started in I.T. and walked around a Data Center it was nearing the end of Mainframe dominance and the explosion in Unix. Over the years there were fewer and fewer mainframes, and more and more Unix (proprietary) platforms. Now when I walk in the Data Center the Unix platforms are fewer and fewer, whilst X84/64 rack and blade servers are everywhere - times have changed.

I shall be throwing some blogs out around the contents of the DCOF event from Intel, Netapp etc.. as well as overviews of Cisco UCS, Vblock (arcadia), HP Bladesystem Matrix and some of the small(er) players mentioned here.

What is Converged Infrastructure?

In this post we are going to deal with "What is Converged Infrastructure?" in later posts I'll be looking at the various companies doing the "How to?". The short answer is the unification of the infrastructure - HW and SW, but I think it leads to alot more (although it starts with the physcial infrastructure).

Lets take a look at the back of a rack in a typical data center, do you recognise this picture?

DSCN0123
Originally uploaded by alonzoD


The traditional Data Center Server has multiple physical connections to the LAN carrying ILO (integrated lights out), backup, management, user access and much more. Typically these would vary from 100MB for Management to maybe 1GB for backup.

In addition you might have redudant connections for some or all of these connections. So you might have any where from 2 LAN (test/development server) to upwards of 8 LAN connections if you want to have a highly redudant (production) server. Now look at the picture again, how many servers can you get in a 42U rack? well it depends 1U/2U/4U servers are not uncommon.

Lets use a 4U server in a 42U rack and do some sums, lets say the rack has maybe 8/9 servers (we will use 8 ) and maybe a ToR (Top of Rack) switch, each server with multiple NICS and multiple LAN cables.

Per Rack LAN cabling

8 Servers x 5 cables (1 mgmt, 1 backup, 1 ILO and 2 User Access) = 40 cables.
Each server has 2 x 1 Port HBA for fibre connection to the SAN

Per Rack SAN cabling

8 x 2 = 16 cables for SAN

Total cables for 1 rack = 56 (not including power or cables for uplink connection to Core Network / SAN fabric)

Then you can multiply that across multiple racks. The effects of all these physical cards/cables:-

Power requirement increased
Restricted airflow
Cabling management nightmare (labeling!!!)

Now consider the management (via tools) of physical infrastructure:-

LAN switch ports
VLAN management
SAN Switch ports
SAN Zoning

Using individual management tools - that is (potentially) alot of management (alot of people).

So what is Converged Infrastructure (IMHO)?

Convergence of the physical items - using techniques to reduce physical items

- Virtualized Servers - VMware, Xen, Hyper-v to reduce physical servers
- Virtualized Network - vSwitch, vNetwork Distributed Switch, Nexus 1000v
- Virtualized Storage - Thin provisioning
- FCoE / DCE - using converged network adaptors that carry both Fibre and LAN traffic on the same physical cable - 2:1 reduction in cabling
- SR-IOV - PCI-e adapter that appears to the OS as multiple adapters

Convergence of Mgmt - using

- A single pane of glass managment tool for Server, Storage and Network
- Dynamic, proactive and automatic configuration of Server, Storage and Network
- Same people to do Server, Storage and Network

Some vendors will put emphasis on different elements of the above points. A key point to note is that Converged Infrastructure (IMHO) is more than just physical it is also the management tools (which leads as a consequence to convergence of I.T. organisation processes)

From a physical view it might look like this - compare this with our picture above.



Note: This is a picture of a Cisco UCS Blade Chassis - 6U in size, so that is 7 chassis (at a push) in a 42 rack, each chassis has max 8 cables (normally 4) so the total number of cables is 56 cables (doesn't sound much better). However there would be 56 servers in the rack (slightly more than 8 in previous example) and single pane of management to go with it - more about Cisco UCS in another post.

Next posts - FCoE, Data Center 2.0/3.0 and Data Challenges

Data Center Of The Future

The data center of the future in terms of technology will be:-

Virtualized
Constituted of private and public (cloud) elements
Converged
Automated
Self-service
Network centric
Energy efficient
Self Managing

It means that:-

Hardware will no longer be unique (it will be commodity based)
Hardware will no longer be under-utilised (it will be running at 70%+)
Reduced Manual Support of Hardware and Software (it will be automated)
IT Focus will no longer be asset based (it will service based)
Reduced CapEx and Reduced OpEX (hopefully)
Reuse, Reuse, Reuse at every level

The companies/communities driving this change:-

The big(er) vendors

IBM
HP
Cisco
EMC
Intel
Dell
Oracle/Sun
VMware
Redhat
Citrix
Microsoft
Google
AMD

The small(er) Vendors

Netapp
InteliCloud
LiquidIQ
Panduit
Egenera
Xsigo
and others

The community

Opensource

The purpose of this blog is to:

Focus on the companies/communities driving this change
To be technology focused (because that is what i like)
Starting with "What is Converged Infrastructure?".....next post