Thursday, January 31, 2008

RAID Break and Making the Cut

Based on my RAID research I have determined I will start with software RAID in a RAID 5 or 1+0 configuration. I still want to research RAID 6 which seems to be in vogue right now.

Taking a break from storage I started to slim the file server list down and the follow potential solutions were cut from my list:

Miracle Linux - No English Documents Found
Nitix - Software is a trial version only
Server Optimized Linux - Only RAID 1 supported and limited documentation
TinySofa - No Documents Found
TupiServer - No English Documents Found

That leaves:

Annvix
Centos
EnGarde
FreeNAS
Openfiler
OpenNA Linux
SLAMPP Live CD
SME Server
StartCom
Ubuntu Server

Tuesday, January 29, 2008

Fakeraid, Softraid, Hardraid, Hell ???

I am definitely getting the feeling there is no hard and fast rules with RAID on any level.

It looks like a few of the big players in the hardware RAID business at the SOHO market level are:

3ware
ArecaHigh Point Technologies
LSI
Promise Technology

A quick look at Newegg revealed 361 RAID cards with prices ranging from $11.99 to $9999.99 Gulp.

I limited my search to a card that would support a minimum of four SATA II connections in a RAID 5 or 10 configuration. The cheapest new card I could find was a High Point RocketRAID 1740 PCI SATA I SATA II Controller Card that supported RAID 0/1/5/10/JBOD for $121.00
This is very doable for a small business solution but probably at the upper end of the price spectrum for most home applications.

Software RAID was starting to look good at this point. A little more digging revealed most of the people running open source software believe software RAID is the way to go based on the articles I read. Some of the reasons cited best summarized at Linux: Why software RAID?

Why prefer Linux software RAID?
  • Potential for increased hardware and software biodiversity
  • Kernel engineers have much greater ability to diagnose and fix problems, as opposed to a closed source firmware. This has often been a problem in the past, with hardware RAID.
  • Disk format is public thus, no vendor lock-in: Your data is not stored in a vendor-proprietary format.
  • A controller-independent, vendor-neutral layout means disks can be easily moved between controllers. Sometimes a complete backup+restore is required even when moving between hardware RAID models from the same vendor.
  • Eliminates single-points-of-failure (SPOF) compared to similar configurations of hardware RAI
  • RAID speed increases as host CPU count (multi-thread, multi-core) increases, following current market trends.
  • Cost. A CPU and memory upgrade is often cheaper and more effective than buying an expensive RAID card.
  • Level of abstraction. Linux software RAID can distribute data across ATA, SCSI, iSCSI, SAN, network or any other block device. It is block device agnostic. Hardware RAID most likely cannot even span a single card.
  • Hardware RAID has a field history of bad firmwares corrupting data, locking up, and otherwise behaving poorly under load. (certainly this is highly dependent on card model and firmware version)
  • Hardware RAID firmwares have a very limited support lifetime. You cannot get firmware updates for older hardware. Sometimes the vendor even ceases to exist.
  • Each hardware RAID has a different management interface, and level of feature support.
  • Your hardware RAID feature set is largely locked in stone, at purchase time. With software RAID, the feature set grows with time, as new features are added to Linux... no hardware upgrade required.
  • Additional RAID mode support. Most hardware controllers don't support RAID-6 as Linux software RAID does, and Linux will soon be adding RAID-5E and RAID-6E support.
  • Many ATA-based hardware RAID solutions either (a) fail to manage disk lifetimes via SMART, or (b) manage SMART diagnostics in a non-standard way.

Why prefer Linux hardware RAID?

  • Software RAID may saturate PCI bus bandwidth long before a hardware RAID card does (this presumes multiple devices on a single PCI bus).
  • Battery backup on high end cards allows faster journalled rebuilds.
  • Battery-backed write-back cache may improve write throughput.
After reading all of this, (some of which I must say I do not fully understand yet) it sounds like there are some compelling reasons to go with software RAID. Another article supporting this was from Unix Pro News about a hardware solution (3ware) that failed and the author is going back to using software RAID. He concluded his article by saying:

Let's just say I've been burned a few times in the past.

Anyway, soon I can finally migrate the data for this site and several others off my old (going on 6 years old) server in Ohio (happily running Software RAID).

In retrospect, I was adding complexity and a new point of failure to a system that had always worked fine in the past. I've learned my lesson.

During all of this I kept seeing how one should avoid FakeRAID. I had no clue what this was so I looked it up and found a reference to it at Wikipedia:

Hybrid RAID implementations have become very popular with the introduction of inexpensive RAID controllers, implemented using a standard disk controller and BIOS (software) extensions to provide the RAID functionality. The operating system requires specialized RAID device drivers that present the array as a single block based logical disk. Since these controllers actually do all calculations in software, not hardware, they are often called "fakeraids", and have almost all the disadvantages of both hardware and software RAID.


A more humorous description was over at Snowflakes in Hell:

Whoever decided that “FakeRAID”, which is a highly technical term used to describe the types of Serial ATA RAID appearing on some cheaper motherboards, was a good idea needs a severe beating. It appears that FakeRAID is just basically a BIOS hint, requiring the CPU on the machine to do the majority of the work with regards to creating and maintaining the array. I was trying to make Ubuntu do the FakeRAID thing on a server at work, but I think I’m just going to use the Linux software RAID, which seems to be the conventional wisdom these days anyway.

Now back to your regularly scheduled gun blogging.

I guess I will not worry to much about what RAID levels are supported by any particular motherboard during future purchasing decisions...

Monday, January 28, 2008

More RAID Stuff

I continued my RAID research and it looks like RAID 1+0 or 10 is looking better and better. It appears RAID 5 is processor intensive compared with RAID 1+0 due to parity calculations. A good comparison chart of the different RAID levels was found at The PC Guide. In fact The PC Guide had lots of good information about numerous topic.

I will still look at RAID 5 but I want to move onto the Software vs. Hardware RAID issue. This could be a contributing factor on what RAID solution is ultimately chosen. It appears many think hardware RAID is the way to go but many in the open source community think software RAID is better.

Some articles I need to read:

Linux Software RAID Vs. Hardware RAID

Monitoring and Managing Linux Software RAID

Friday, January 25, 2008

RAID 5 or 1+ 0 Continued


After reading this article: Smart SOHOs Don't Do RAID I am starting to second guess myself if a RAID is really needed after all.

Sumarrizing from the BytePile Website :

RAID 5 - Most versatile RAID level
RAID Level 5 requires a minimum of 3 drives to implement

Advantages: Highest Read data transaction rate. Medium Write data transaction rate. Low ratio of ECC (Parity) disks to data disks means high efficiency. Good aggregate transfer rate.

Disadvantages: Disk failure has a medium impact on throughput. Most complex controller design. Difficult to rebuild in the event of a disk failure (as compared to RAID level 1). Individual block data transfer rate same as single disk.

Recommended Applications: File and Application servers ? Database servers ? WWW, E-mail, and News servers ? Intranet servers ?


RAID 10 - Very High Reliability combined with High Performance
RAID 10 requires a minimum of 4 drives to implement.

Advantages: RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. RAID 10 has the same fault tolerance as RAID level 1. RAID 10 has the same overhead for fault-tolerance as mirroring alone. High I/O rates are achieved by striping RAID 1 segments. Under certain circumstances, RAID 10 array can sustain multiple simultaneous drive failures. Excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost.

Disadvantages: Very expensive / High overhead. All drives must move in parallel to proper track lowering sustained performance. Very limited scalability at a very high inherent cost.

Recommended Applications: Database server requiring high performance and fault tolerance?


This all sounds good but would I better be served (no pun intented) by a second file server or NAS? This could potential address the need for off site backups while eliminating the need for a RAID. Will or can i take advantages of the performace increases associated with a RAID?

Storage is cheap so a belt and suspenders solution may be a viable way to take care of all my data needs while capitolizing on any performace gains.

Before I started posing this blog I was eyeing the very hackable Buffalo 500GB LinkStation Live that runs Linux. It has been heavily discounted as of late and can be purchased for around $200 USD.

I would have to verify the following:

"In order to make this approach work, one of the two NASes must support scheduled backup to or from a networked drive. Most all NASes support backup to a USB attached drive and many do this trick with a networked share. But some drives support only attached drive backup."

Thursday, January 24, 2008

Which RAID Type? Hardware or Software?

After looking at all of this information I realized I do not want to lose any information after setting this server up. I started to read over different articles and message threads about RAID (Redundant Arrays of Independent Disks) I came across a few interesting ones that were helpful:

The essential RAID primer
RAID Types - Classifications

Chipset Serial ATA and RAID performance compared
Why home RAID won't fly
Sorry about your broken RAID 5
Which RAID for a Personal Fileserver

RAID 0: This is a striped set, there is no redundancy. One drive goes, everything's gone. Usable space = 100%

RAID 1: This is a mirrored set. Typically this involves 2 drives. One drive is an exact copy of the second. If a drive fails, you replace it and rebuild the set. Life goes on. Usable space = 50%. Most IDE raid cards only support RAID 0 AND 1.

RAID 5: This is a striped set with parity. You get the performance associated with a striped set. Particularly on reads. If you have 4 drives, there are 4 stripes. 3 of those stripes are data stripes, the 4th is parity. Lose 1 drive and the parity information is used to rebuild the set. Usable space = (n-1)/n. To do this in hardware is typically fairly expensive.

For a file server, I'd use the combination of RAID 1 and striping known as RAID 1+0 or RAID 10.
The benefits are that you get the same protection as with RAID 1, but lose the speed penalty, all without needing special hardware or spare CPU power for expensive CRC calculations.

With a 4 drive RAID 1+0, you'll get read performance of 2x-4x a single drive, while writes will be from 1x-2x. In theory, that is. In reality, if using a RAID PCI card or motherboard solution hooked to the south bridge, you'll most likely max out the read speed.

Anyhow, it's a very cheap solution that doesn't tax your CPU too much even if done through software (like with a highpoint controller), and it does give you piece of mind.

The worst downside is that you will have to take the system down to change a drive (correct me if I'm wrong, but I've never seen a hot-swappable RAID 1+0 solution), and the performance before you do that will take a substantial hit.

Raid 4/5 is nice because it doesn't waste a lot of drive space, but it comes at the price of very slow writes, and very high CPU use unless you also get a hardware controller with an on-board CPU.


RAID 1+0 is the Cadillac of RAID

Yet if you do choose to use RAID, I submit that for important data, RAID 1+0 should be your first choice. It offers good performance - not as good as RAID 5 on reads, but much better on small writes - and it is much more resilient than RAID 5 when you do have a disk failure.

A RAID 5 rebuild costs you about half your IOPS capacity as well as controller or CPU cycles. With RAID 1+0 a rebuild is a simple disk to disk copy which is as efficient and fast as you can get.

Because it mirrors, RAID 1+0 capacity is more expensive than RAID 5. For business critical data, RAID 1+0 gives the best combination of performance, availability and redundancy.


After this preliminary look at RAID it would seem RAID 5 or 10 would be the way to go. This would require a minimum of 3 or 4 disks respectively. Another alternative is RAID 1E which I plan on looking at in further detail

Wednesday, January 23, 2008

FreeBSD: The best server OS?

I read this article about FreeBSD being the best server OS and found it interesting. One part in particular caught my eye:

"FreeBSD Jails (easy to maintain/update many different servers on one box; low overhead). This feature alone keeps me with FBSD. I don't have to install virtualization software or heavily modify a kernel for this to work, either."

What would it take to make a complete BSD solution?

Tuesday, January 22, 2008

Weeding Out Solutions

The potential solutions list was a little overwhelming so I decided to start to weed out some of them starting in the security category. So far I have eliminated the following:

Euronode - no English documentation on website
Astaro - could not find a link for the free version only a trial version

Smoothwall have a very polished and well thought out website, over 15,000 forum members and VMware images of product.

The new security list is:
After a quick review of the website I would have to rank the software into three tiers:

Tier 1
Tier 2
Tier 3
I still need to work on a list of what this server should be able to accomplish.

Monday, January 21, 2008

Linux Outlaws 12

While listening to Linux Outlaws Episode 12 the hosts talked about different virutalizaion software. They indicated they both use VMware and it works well.
  • They are both interested in XEN because many ISP use it and as well as Fedora and Red Hat based Linux distributions.
  • QEMU is used by Damn Small Linux (DSL) and works well with the kernel.
  • They were concerned about the USB speeds in Virtual Box (does not support USB 2.0 in free version?). Differences are Here
  • They also noted Paralles was slow compared to VMware.

Sunday, January 20, 2008

Objective

My objective is to post about Virtualization, Open Source, Servers and get and Education in the process hence VOSS. This blog is a tool where I can organize my thoughts, documenting my research, and potentially solicit feedback while taking into account my interests and preferences.

Software Goals
I want to utilize off the shelf open source software so that whatever I do can be easily duplicated and at low cost by myself as well as others if they are so interested. I want to use Linux and BSD based software solutions whenever possible.

Potential Software Solutions

Virtualization
Voip - Voice Over Internet Protocol
HTPC - Home Theatre Personal Computer
Filer Server
Security
Hardware Goals
I also want any hardware that is going to be used to be cost effective and environmentally friendly. I want to use articles like this to help guide me in my decisions: How to Build a Green PC

Next Steps
  1. Better define what I want to accomplish
  2. Create initial list of questions
  3. More research on protocols
  4. Start list of must have vs. nice to have features