Storage Server Technology: July 2011

Tuesday, July 26, 2011

Real-Time Data Compression's Impact on SSD Throughput Capability

SSDs are one of the hottest technologies in storage. They have great throughput performance (read and write), great IOPS performance with the challenges of a limited number of rewrite cycles, and a much higher price than spinning media. In addition, there are some challenges to make the technology perform well, requiring new techniques to improve the overall behavior. Some of these controller techniques improve performance or longevity, or both. However, these techniques must be "tuned" so the best possible performance is extracted from the technology. One SSD controller company has a technique that improves both the performance and the longevity, but the improvements are solely based on your data.
SandForce is a fairly new SSD controller company. Its products are used in a great number of SSDs, from affordable consumer SSDs to enterprise-class SSDs. The technique SandForce has developed is that its controllers use real-time data compression. They actually compress the data before it is written to the drive enabling the performance can be increased and the longevity of the drive can be improved.
This article examines the concepts of using real-time data compression in SSDs by taking a consumer SandForce-based SSD for a spin. I test the throughput performance using IOzone, which allows me to vary the compressibility (dedupability) of the data so I can see the impact on the throughput performance of the drive. The results are pretty exciting and interesting as we'll see.

Real-Time Data Compression in SSDs

I talked about real-time data compression in SandForce SSD controllers in a different article, but the concepts definitely bear repeating because they are so fascinating.

The basic approach SandForce has taken is to use some of the capabilities from its SSD controller for real-time data compression. Since the implementation is proprietary, I can only speculate on what is going on inside the controller. Most likely, the uncompressed data comes into the drive (the controller) and is likely stored in a buffer. The controller then compresses the data in the buffer in individual chunks or perhaps coalescing the data chunks prior to compression. This process takes some time and computational resources prior to writing it to the storage media.
Once the data is compressed, it is placed on the blocks within the SSD. However, things are not quite this simple. To ensure the drive is reporting the correct amount of data stored, presumably the uncompressed data size is also stored in some sort of metadata format on the drive itself (maybe within the compressed data?). This means that if a data request comes into the controller with a request for the number of data blocks or size of a data block, the correct size is reported.
But since the data has been compressed, the amount of data that is written is less than the uncompressed data. Less data is written to the storage media, which means less time is used, which means faster throughput. The amount of time used to write the data is proportional to the size of the compressed data (i.e., the compressibility of the data), which drives the throughput performance. But we also need to remember that the "latency" for a SandForce controller can be higher than a typical controller because of the time needed to compress the data. I'm sure SandForce has taken this into account so that not too much time is spent compressing the data. In fact, I bet it's a constant time compression algorithm.
During a read operation, the compressed data is probably read into a cache and then uncompressed. After that, it is sent to the operating system as though the data was never compressed. Presumably, the performance also depends on the ability to uncompress the data quickly so the algorithm should have a fixed time.

The really interesting and exciting part of this is that the compressibility of your data influences the performance of the storage media. If your data is very compressible, then your performance can be very, very good. If your data is as compressible as a rock then your performance may not be as good. But before you start thinking, "my data is very incompressible," note that I have seen lots of different data sets (even binary ones) capable of being compressed. Also keep in mind that the SandForce controller needs to compress the data only in a specific chunk, not the entire file. So it's difficult to say a priori with any certainty that your data will not perform well on a SandForce controller. I'm sure SandForce has spent a great deal of time running tests on typical data for the target markets and thus has a good idea of what it can and cannot do. Since the controller is quite popular, I'm sure the efforts have been successful.
However, the fundamental fact remains that the performance of the SSD depends on the compressibility of the data. Thus, I decided to get a consumer SSD with a SandForce 1222 controller and run some IOZone tests with variable levels of dedupability (compressibility) to see how the SSD performs.

Testing/Benchmarking Approach and Setup

The old phrase of "if you're going to do it, do it right," definitely rings true for benchmarking. All too often storage benchmarks are nothing less than marketing materials providing very little useful information. In this article, I will follow concepts that aim to improve the quality of the benchmarks. In particular:

The motivation behind the benchmarks will be explained (if it hasn't already).
Relevant and useful storage benchmarks will be used.
The benchmarks will be detailed as much as possible.
The tests will run for more than 60 seconds.
Each test is run 10 times, and the average and standard deviation of the results will be reported.

These basic steps and techniques can make benchmarking and testing much more useful.
The benchmarks in this article are designed to explore the write, read, random write, random read, and fread, fwrite performance of the SSD. I don't really know what the results will be prior to running the tests so there are no expectations -- it's really an exploration of the performance.
This article examines the performance of a 64GB Micro Center SSD that uses a SandForce 1222 controller. This is a very inexpensive drive that provides about 60GB of useable space for just under $100. ($1.67/GB). The specifications on the website state the drive has a SATA 3.0 Gbps interface (SATA II), and it has a performance of up to 270 MB/s for writes, 280 MB/s for reads, and up to 50,000 IOPS.

The highlights of the system used in the testing are:

GigaByte MAA78GM-US2H motherboard
An AMD Phenom II X4 920 CPU
8GB of memory (DDR2-800)
Linux 2.6.32 kernel
The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
/home is on a Seagate ST1360827AS
The Micro Center SSD is mounted as /dev/sdd

I used CentOS 5.4 on this system, but I used my own kernel -- 2.6.32. Ext4 will be used as the file system as well. The entire device, /dev/sdd was used for the file system so it is aligned with page boundaries.
I used IOzone because it is one of the most popular throughput benchmarks. It is open source and written in very plain ANSI C (not an insult but a compliment), and perhaps more importantly, it tests different I/O patterns, which very few benchmarks actually do. It is capable of single-thread, multi-thread, and multi-client testing. The basic concept of IOzone is to break up a file of a given size into records. Records are written or read in some fashion until the file size is reached. Using this concept, IOzone has a number of tests that can be performed. In the interest of brevity, I limited the benchmarks to write, read, random write, random read, fwrite, and fread, all of which are described below.

Write: This is a fairly simple test that simulates writing to a new file. Because of the need to create new metadata for the file, many times the writing of a new file can be slower than rewriting to an existing file. The file is written using records of a specific length (either specified by the user or chosen automatically by IOzone) until the total file length has been reached.
Read: This test reads an existing file. It reads the entire file, one record at a time.
Random Read: This test reads a file with the accesses being made to random locations within the file. The reads are done in record units until the total reads are the size of the file. Many factors impact the performance of this test, including the OS cache(s), the number of disks and their configuration, disk seek latency and disk cache.
Random Write: The random write test measures the performance when writing a file with the accesses being made to random locations with the file. The file is opened to the total file size, and then the data is written in record sizes to random locations within the file.
fwrite: This test measures the performance of writing a file using a library function "fwrite()". It is a binary stream function (examine the man pages on your system to learn more). Equally important, the routine performs a buffered write operation. This buffer is in user space (i.e., not part of the system caches). This test is performed with a record length buffer being created in a user-space buffer and then written to the file. This is repeated until the entire file is created. This test is similar to the "write" test in that it creates a new file, possibly stressing the metadata performance.
fread: This is a test that uses the fread() library function to read a file. It opens a file and reads it in record lengths into a buffer that is in user space. This continues until the entire file is read.

Other options can be tested, but for this exploration only the previously mentioned tests will be examined.
For IOzone, the system specifications are fairly important since they affect the command-line options. In particular, the amount of system memory is important because this can have a large impact on the caching effects. If the problem sizes are small enough to fit into the system or file system cache (or at least partially), it can skew the results. Comparing the results of one system where the cache effects are fairly prominent to a system where cache effects are not conspicuous is comparing the proverbial apples to oranges. For example, if you run the same problem size on a system with 1GB of memory compared to a system with 8GB you will get much different results.
For this article, cache effects will be limited as much as possible. Cache effects can't be eliminated entirely without running extremely large problems and forcing the OS to virtually eliminate all caches. However, one of the best ways to minimize the cache effects is to make the file size much bigger than the main memory. For this article, the file size is chosen to be 16GB, which is twice the size of main memory. This is chosen arbitrarily based on experience and some urban legends floating around the Internet.
For this article, the total file size was fixed at 16GB and four record sizes were tested: (1) 1MB, (2) 4MB, (3) 8MB, and (4) 16MB. For a file size of 16GB that is (1) 16,000 records, (2) 4,000 records, (3) 2,000 records, (4) 1,000 records, respectively. Smaller record sizes took too long to run since the number of records would be very large so they are not used in this article.
The command line for the first record size (1MB) is,

./IOzone -Rb spreadsheet_output_1M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 1M > output_1M.txt

The command line for the second record size (4MB) is,

./IOzone -Rb spreadsheet_output_4M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 4M > output_4M.txt

The command line for the third record size (8MB) is,

./IOzone -Rb spreadsheet_output_8M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 8M > output_8M.txt

The command line for the fourth record size (16MB) is,

./IOzone -Rb spreadsheet_output_16M.wks -s 16G -+w 98 -+y 98 -+C 98 -r 16M > output_16M.txt

The options, "-+w 98", "-+y 98", and "-+C 98" are options that control the dedupability (compressibility) of the data. IOZone uses the phrase dedupe to describe if the data is capable of being deduplicated. This is basically the same as compressed, so I will use the phrases interchangeably. The number "98" in the options means the data is 98 percent dedupable, hence very compressible. These three options allow me to control the compressibility of the data, so I can examine the impact of data compressibility on performance.
I tested three levels of data compressibility -- 98 percent (very compressible), 50 percent, and 2 percent (very incompressible). I wanted to get an idea of the range of performance with these levels of data compressibility but I didn't want to get unrealistic results with either 100 percent compressible data or 0 percent compressible. (Though I'm not really sure if either of those are really possible.)

Results

All of the results are presented in bar chart form, with the average values plotted and the standard deviation shown as error bars. For each of the three levels of compressibility (98 percent, 50 percent and 2 percent), I plot the results for the four record sizes (1MB, 4MB, 8MB, and 16MB).
The first throughput result is the write throughput test. Figure 1 below presents the results for the four record sizes and the three levels of dedupability (compressibility). The averages are the bar charts and the error bars are the standard deviation.

(Figure 1)Write Throughput Results from IOzone in KB/s

The first obvious thing you notice in Figure 1 is that as the level of compressibility decreases (thus decreasing dedupability), the performance goes down. The best performance for the cases run was for a 1MB record size and 98 percent dedupable data, where the write throughput performance was about 260 MB/s, which is close to the stated specifications for the drive. As record size increases, the performance goes down, so that at a record size of 16MB, the write throughput performance was about 187 MB/s for 98 percent dedupability.
For data that is 50 percent dedupable (compressible), the performance for a record size of 1 MB was only about 128MB/s, and for data that is almost incompressible (2 percent dedupable), the performance for a record size of 1 MB was only about 97 MB/s. So the performance drops off as the data becomes less and less compressible (as one might expect). Another interesting observation is that as the data becomes less compressible, there is little performance difference between record sizes.

(Figure 2) Read Throughput Results from IOzone in KB/s

These results differ from the write results in that the performance does not decrease that much as the level of data compressibility rises. For a 1MB record size, at 98 percent dedupable data, the performance was about 225 MB/s, for 50 percent dedupable data the performance was about 202 MB/s, and for 2 percent dedupable data, the performance was about 192 MB/s.

In addition, as the level of compressibility decreases the performance difference between record sizes almost disappears. Compare the four bars for 98 percent dedupable data and for 2 percent dedupable data. At 2 percent the performance for the four record sizes is almost the same. In fact, for larger record sizes, the read performance actually improves as the data becomes less compressible.

Figure 3 presents the random write throughput results for the four record sizes and the three levels of dedupability (compressibility).

Random Write Throughput Results from IOzone in KB/s for the Four Record Sizes and the Three Data Dedupable Levels

Figure 3
Random Write Throughput Results from IOzone in KB/s

(Click for larger image)

Figure 3 reveals that just like write performance, random write performance drops off as the level of data compressibility decreases. For the 1MB record size, the random write throughput was about 241 MB/s for 98 percent dedupable data, 122 MB/s for 50 percent dedupable data, and about 93 MB/s for 2 percent dedupable data.
In addition, just like the write performance, the performance difference between record sizes is very small as the level of compressibility decreases. So for small levels of dedupable data, the record size had very little impact on performance over the range of record sizes tested.
Figure 4 presents the random read throughput results for the four record sizes and the three levels of dedupability (compressibility).

Random Read Throughput Results from IOzone in KB/s for the Four Record Sizes and the Three Data Dedupable Levels

Figure 4
Random Read Throughput Results from IOzone in KB/s

(Click for larger image)

The general trends for random read performance mirror those of the read performance in Figure 2. Specifically:

The performance drops very little with decreasing compressibility (dedupability).
As the level of compressibility decreases, the performance for larger record sizes actually increases.
As the level of compressibility decreases, there is little performance variation between record sizes.

Figure 5 below presents the fwrite throughput results for the four record sizes and the three levels of dedupability (compressibility).

Fwrite Throughput Results from IOzone in KB/s for the Four Record Sizes and the Three Data Dedupable Levels

Figure 5
Fwrite Throughput Results from IOzone in KB/s

(Click for larger image)

These test results mirror those of the write throughput test (Figure 1). In particular,

As the level of compressibility decreases, the performance drops off fairly quickly.
As the level of compressibility decreases, there is little variation in performance as the record size changes (over the record sizes tested).

Figure 6 presents the fread throughput results for the four record sizes and the three levels of dedupability (compressibility).

Figure 6
Fread Throughput Results from IOzone in KB/s

(Click for larger image)

Like the two previous read tests (read, random read), the general trends for fread performance has the same general trends:

The performance drops only slightly with decreasing compressibility (dedupability).
As the level of compressibility decreases, the performance for larger record sizes actually increases.
As the level of compressibility decreases, there is little performance variation between record sizes.

Summary

SandForce is quickly becoming a dominant player in the fast-growing SSD controller market. It makes SSD controllers for many consumer drives and increasingly, enterprise-class SSDs. One of the really cool features of its controllers is that they do real-time data compression to improve throughput and increase drive longevity. The key aspect to this working for you is that the ultimate performance depends on your data. More precisely -- the compressibility of your data.
For this article I took a consumer SSD that has a SandForce 1222 controller and ran some throughput tests against it using IOzone. IOzone enabled me to control the level of data compressibility, which IOzone calls dedupability, so I could test the impact on performance. I tested write and read performance as well as random read, random write, fwrite, and fread performance. I ran each test 10 times and reported the average and standard deviation.
The three write tests all exhibited the same general behavior. More specifically:

As the level of compressibility decreases, the performance drops off fairly quickly.
As the level of compressibility decreases, there is little variation in performance as the record size changes (over the record sizes tested).

The absolute values of the performance varied for each test, but for the general write test, the performance went from about 260 MB/s (close to the rated performance) at 98% data compression to about 97 MB/s at 2% data compression for a record size of 1 MB.
The three read test all also exhibited the same general behavior. Specifically,

The performance drops only slightly with decreasing compressibility (dedupability)
As the level of compressibility decreases, the performance for larger record sizes actually increases
As the level of compressibility decreases, there is little performance variation between record sizes

Again, the absolute performance varies for each test, but the trends are the same. But basically, the real-time data compression does not affect the read performance as much as it does the write performance.
The important observation from these tests is that the performance does vary with data compressibility. I believe that SandForce took a number of applications from their target markets and studied the data quite closely and realized that it was pretty compressible and designed their algorithms for those data patterns. While SandForce hasn't stated which markets they are targeting I think to understand the potential performance impact for your data requires that you study your data. Remember that you're not studying the compressibility of the data file as a whole but rather the chunks of data that a SandForce controller SSD would encounter. So think small chunks of data. I think you will be surprised at how compressible your data actually is. But it's always a good idea to test the hardware against your applications.
April 12, 2011
By Jeffrey Layton

Thursday, July 14, 2011

7 Hot Cloud Implementations

The nebulous nature of the cloud makes this a highly subjective article. There are so many definitions of the cloud, so many vendors rushing into the market, and so many new technologies that choosing the standouts is something of a crapshoot. But here are seven good examples of cloud services and implementations that make sense and might add value in an enterprise setting.

1. Desktop Cloud ~ Applied Materials recently moved from 17 decentralized IT groups down to one. It decided to rid itself of the expense of changing out desktop PCs every three years, not to mention the cost of maintaining them. Instead, it virtualized many of its desktops, starting with its higher end Computer Aided Design (CAD) users that operate for the company from multiple sites around the world.

"We developed a desktop cloud for CAD," said Jay Kerley, Deputy CIO of Applied Materials.
He didn't see the point of keeping data tied to the desktop. By putting it on the cloud, the data follows users wherever they go, and they need only a screen to visualize it. Under the desks are stripped-down desktop blades with a Graphics Processing Unit (GPU) in place of big ticket workstations. In addition, high-speed networking is piped in to each cubicle to eliminate networking slows. Kerley added that HP Remote Graphics Software (RGS) was the final element that pulled everything together, enabling CAD professionals to collaborate in real time by accessing the cloud and seeing rich 3D designs on screen.

2. Using the Cloud to Lower Innovation Costs ~ Dave Smoley, CIO Flextronics International, used the cloud to lower the cost of innovation within a company that has 27 million manufacturing square feet at 130 locations in 30 countries. It operates two data centers with more than 400 TB of storage. Flextronics spends less than 1 percent on IT. Yet Smoley said unbelievably tight budgets necessitate tremendous internal innovation. In such an environment, he said, it often pays big dividends to look for newer ways to do things rather than opting for the industry leader that "everybody uses."

For example, the company had many human resources applications running throughout the enterprise. When it came time to centralize on one platform, conventional wisdom pointed to the established HR package software used in 70 percent of large organizations. Although one business unit already used it, Flextronics partnered with the small cloud vendor find a simpler, more useable, faster and cheaper online tool.
"Leadership questioned not going with the market leader, but it saved us more than $15 million," said Smoley.

3. Internal Cloud ~ This example also comes from Flextronics. It wanted to harness social networking to speed up global collaboration. With many solutions on the market, the company again looked to cut costs by allowing a small Flextronics software team based in the Ukraine to experiment with an internal Facebook-type application it developed called Whisper Enterprise Collaboration, which has been piloted and is now being rolled out across the operation. Huge savings resulted by developing it using internal resources.

Similarly, Flextronics created an internal YouTube-type video sharing app for engineers to share problems and solutions with their peers around the world. Instead of spending hundreds of thousands, it ended up costing $8000 a year to connect to the app and host videos on the cloud.

"Companies will sell you $250,000 worth of equipment to house all your media, but why invest in it if it's obsolete in six months," said Smoley. "The cloud and the consumerization of IT are having a huge impact. Anyone can find a good bottle of wine for $60, but the trick is to find one for $10."

4. Bleaching the Cloud ~ Clorox Company had an aging infrastructure and had historically underinvested in IT. But as it expanded into a global market, this had to change. The company opened two centralized data centers, outsourced some data center hosting services, and put other services on the cloud. It upgraded from running Windows 2000 with four-year old Lotus email and aging Blackberrys a year ago to laptops running the latest versions of Windows, new smartphones and iPads. Despite having no money for this project, IT returned $500,000 to the bottom line.

"We achieved this by adding a lot more cloud based services and web-based apps," said Ralph Loura, CIO of Clorox.
For example, the cloud is used for Microsoft Exchange and SharePoint. The company tested them for two months then implemented.
"It works really well," said Loura. "Employees can use them from home and synch/connect from anywhere without having to go through a virtual private network firewall. And because it is on the cloud, it didn't cost anything more when all factors are taken into account."

5. Apps to the Cloud ~ HP CIO Randy Mott is pushing forward on a strategy to increase efficiency by moving many IT apps to the cloud. An inventory of global revealed that eight to 10 apps were doing the same thing in different parts of HP operations. He blamed this, in part, for keeping IT focused too much on maintenance of the existing infrastructure rather than on innovation to increase business productivity. The inventory also revealed that HP had 85 data centers in 29 countries running more than 7,000 applications, 700 data marts and 1,240 active business projects all for internal IT. From that, the company went to six data centers, less than 1,700 apps, one enterprise data warehouse and 500 active business projects.
"That kind of inefficiency meant we could only spend 10 percent of our time on innovation and the rest on keeping the lights on," said Mott.
The company divided its apps into two categories. Enterprise applications will comprise 45 percent of the consolidated HP global IT platform and will be hosted internally. The rest will be delivered as software as a service on the cloud.

"It takes leadership and vision to adopt a next-generation architecture and to push through all the barriers to achieve real gains," said Mott. "By doing so, we took IT spend from 4 percent of our revenues down to 2 percent."

6. Cloud Acceleration ~ An industrial pump company known as Pump Solutions Group (PSG) has a data center in Southern California that it augments with cloud-based storage. Jeff Rountree, Global Network Manager for PSG, implemented cloud resources managed by AT&T, which has rapidly established itself as one of the major players in cloud infrastructure. But he realized that solving mushrooming storage demands by throwing everything onto the cloud was a recipe for trouble. Initially, the apparent low cost of storage would save the organization money.
However, as data was placed on the cloud indiscriminately, he foresaw a future where storage again began to cost the organization too much. After all, most cloud providers charge by GB, often based on how many GBs uploaded and how many downloaded.
Instead of shipping everything to the cloud, therefore, Rountree saw the value of deduplicating and encrypting his backups before sending them to AT&T. He purchased the Whitewater Cloud Storage Accelerator by Riverbed.
"Why pay for 100 GB when you can deduplicate it and pay for 10 GB instead," Rountree said.
For disaster recovery purposes, he set it up that AT&T would keep local and remote copies of his backups to ensure redundancy and failover, should the AT&T systems suffer an event, while also keeping a master copy of his data onsite in the Whitewater appliance.
"Whitewater Accelerators optimize and deduplicate data, so that keeps my costs down in a pay-as-you-go cloud model," said Rountree. "Backup times have been cut in half, and we have entirely eliminated the practice of having to stage backups on disk before sending them to tape. In fact, we have no more need to truck backup tapes offsite."

7. Cloud Philosophy ~ The cloud has even given rise to high-flown philosophizing about how it can change the world. Michael Schrage, research fellow at MIT Sloan School's Center for Digital Business said, "The cloud is the greatest medium for rapid multi-modal experimentation and test in the history for the world."
What he's talking about is the future of the agile infrastructure. Many organizations, he said, decline to ask fundamental questions about the business they are in and the value they provide. Just being a technology company, for example, is not enough. He said that infrastructure has a very bad brand -- regarded as overhead. IT has to get away from that, and he pushes the cloud heavily as the best approach. But he cautioned that many in IT, by retaining old beliefs in centralized IT, were under threat by the cloud.
"Business managers who have had trouble getting IT to experiment and test things out, are going to the cloud to do it themselves," said Schrage.
Due to the explosion in data volumes, he thinks R&D is gradually becoming E&S (experiment and scale).
"Storage and IT owns scaling, so why not adopt the experimental side?" said Schrage. "Instead of being in the back office, you could move to the front line."
The big inflection point is to do it with internal IT or with the cloud. It is up to IT if it resists this movement and seeks to live within a data center silo, he said, or adopts the cloud and evolves into this new role. Otherwise, outfits like Amazon could well see themselves taking over more and more of the storage role via online services.

Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).

May 26, 2011

By Drew Robb

Tuesday, July 12, 2011

Custom Fibre SAN Storage Solutions for a wide range of systems.

The Aberdeen AberSAN-FC Kit is a Fibre Channel storage area network (SAN) package that provides virtually unlimited storage expansion via incremental additions of Aberdeen XDAS RAID expansion subsystems. The AberSAN-FC provides the building blocks to achieve future storage expansion for environments with mission critical and high-bandwidth applications. By delivering a high-performance, high-capacity storage solution, the AberSAN-FC offers customers a flexible low-cost, low-maintenance Fibre storage area network. A key advantage to the XDAS is it takes advantage of disk-to-disk SATA storage via Fibre Channel connection and can be daisy-chained to provide larger storage capacities.

Purpose-built for high volume storage, the innovative AberSAN-FC reduces the cost of building from 4TB up to 180TB in a consolidated storage array system from a single Fibre switch with all the benefits of an enterprise-class SAN solution. For maximum storage availability additional Fibre switches can be easily integrated into the array to surpass 1PB (1 petabyte) of storage. The scalability of the AberSAN-FC Kit dramatically lowers the Total Cost of Ownership (TCO) and is backed by an industry leading 5-year warranty to ensure maximum investment protection. Choose the appropriate Fibre kit that best suites your storage scalability needs.

Single Switch Fibre SAN Kit

Dual Switch Fibre SAN Kit

Thursday, July 7, 2011

Is It Time to Add More Storage?

On any given day, you'll find system administrators searching every corner of their desks, their data centers, and their secret hardware stashes for more storage, much like junkies trying to score their next drug fix. You'll hear sweet nothings pour from the mouths of managers in praise of the awesome job the storage guys are doing, and you'll smell sacrifices of pizza, hot wings, and various baked goods to the SAN gods, only to hear the meek words just above a whisper: "We're out of space. We need to purchase more."

We're all storage junkies, and we all need rehab.

From this point forward, things go awry. "How can we be out of space?" the wary project managers asks, "We purchased 50 TB less than six months ago." Yes, you did purchase 50 TB less than six months ago as part of your physical-to-virtual (P2V) initiative. But, less than halfway into the transition, you're out of space. In fact, the space you have is over provisioned.

"But, we're using thin provisioning, we should have plenty of space," he states emphatically.

Thin provisioning. Yes, another good "in theory" practice that works everywhere except the production data center.

Over provisioned space is a rampant problem in data centers. Check out any of your virtual cluster datastores and report what you see. You'll see that almost every LUN is full or over provisioned. Why? Thin provisioning. Thin provisioning is a great idea if your data never grows or grows so slowly that you'll never fill up the allotted resources in a system's expected lifetime. However, to believe that space remains constant is a fallacy that causes more outages than failed hardware. How often do you experience an outage related to filled space due to over provisioning? Was this over provisioning related to thin provisioning for your systems?

The answer to both questions is likely, "Yes."

SAN vendors and virtualization software companies claim that over provisioning is an acceptable practice, and it's actually a feature of their systems. While some workloads (file services, network services, software repositories) operate well on thin-provisioned surfaces, most do not.

Does this mean thin provisioning is bad, and you should never use it? Certainly not. Thin provisioning from a virtual machine perspective is a bad practice. It leads to over provisioning and outages. Thin provisioning from the SAN configuration viewpoint is a good practice. It leads to less storage waste and faster expansion as systems request space from the host's available storage pool.

But, the solution isn't thin provisioning. The solution is to take a conservative approach to storage provisioning and storage use. System administrators and SAN administrators can expand volumes as needed so over provisioning is not necessary. The exception to this, however, is the system volume, which in some cases cannot be extended.

The typical organization takes the attitude that there is unlimited storage available, and it requests far more than it will ever realistically use for a system. This storage addiction led vendors to develop thin provisioning and over provisioning. To alleviate this addiction, they don't take a conservative approach -- they add more storage and waste it.

But, we're not totally to blame for our storage addiction. Organizations are conditioned to need bigger, better, faster and more storage. We're rewarded with larger disks that promise extreme speeds coupled with lower cost and lower power consumption. Operating systems are bloated. File sizes are bloated. Databases have increased in size exponentially. We now employ data warehouses, data marts, and data malls in every aspect of business. We're information junkies. We're data junkies. And, we need more space to accommodate our addiction. How long ago were you impressed with a 500MB database? Five years? Now, we hardly flinch at 2TB databases. We're no longer surprised by bloatware; nor are we impressed with it. Storage is cheap. It's fast. It's available. And, you're entitled to more of it.

We're storage junkies and there seems to be no rehabilitation for us in the foreseeable future. We're happy with our data hoarding. We're happy with our ever-increasing storage waistlines. We're happy with our addiction. Now, stop reading this and go allocate another 500GB LUN for me, I'm hurting bad.

Ken Hess is a freelance writer who writes on a variety of open source topics including Linux, databases, and virtualization. He is also the coauthor of Practical Virtualization Solutions, which was published in October 2009. You may reach him through his web site at http://www.kenhess.com.

Wednesday, July 6, 2011

6 Tips for Better Small Business Storage and Data Protection

Small businesses rely on technology to reduce costs and increase productivity. As business demands grow against shrinking budgets and resources, the latest small business technology trends emphasize saving money and improving speed. While that's good for a business’ bottom line, the resulting data from increased speeds can place a burden on older storage infrastructures.
Access to advanced technology makes jobs easier and faster, but workers’ productivity gains from new technology result in more data than many small businesses are accustomed to managing. Business continuity and disaster recovery (BC/DR) planning -- on which many small businesses are just beginning to focus -- compounds the stress, and it also drives demand for greater storage capabilities.
Consequently, procuring and maintaining adequate space for all the new electronic information, not to mention managing, storing and backing up this extra data, is a growing challenge for small businesses.
Small business owners must ask and answer many questions when it comes to storage: How much data do we need to store? Can our current solution withstand our projected business growth? Should we invest in new technologies? Do we have the ability to resume operations quickly after a disaster, such as a fire or flood?
The good news is that there are many cost-effective options to simplify data storage. Many vendors offer straightforward and accessible solutions with a broad range of capabilities. Server and storage specialists developed the following six tips to help small businesses jumpstart the process of choosing and maintaining the right storage solution for their needs:

1. Speak with a Human Being

Storage is not a one-size-fits-all solution. Small businesses must work closely with vendors to choose the best solution for all current and future storage needs and requirements. Additional benefits of forming solid relationships with vendors include further insight into equipment maintenance and troubleshooting processes, the latest storage trends and capabilities and increased flexibility with pricing.

2. Evaluate Network-Attached Storage (NAS)

Consider replacing file servers with network attached storage (NAS) -- a hard-disk, file-based data storage solution dispersed throughout a network -- to handle the increasing amount of data contained in email and electronic documents. NAS provides employees with quick, centralized access to data on the network and ultimately improves data sharing.

Most network storage devices now come with built-in data backup applications for added protection and convenience. More sophisticated devices include multiple drives that provide more data protection and expansion opportunity. Looking forward, more NAS devices will include built-in wireless connectivity, which simplifies data storage and file sharing one step further by providing direct access to data from any location in the office.

3. Protect Your Critical Business Data

Aside from people, information is the most critical asset for almost every small business. Organizations should first categorize data by function and application, and then rank the business impact of losing data in each category. The greater the impact, the more often you should back up the category.

Upgrading backup storage systems to faster versions reduces the time required to complete a backup cycle, and it may even lead you to choose a continuous back up decision for mission-critical data. Multiple copies of data should be stored off site, at a remote location, far from the primary data center.

4. Consolidate to Save Space and Money

As companies increase in size and complexity, documents and data tend to be shared throughout the organization, increasing storage volumes exponentially. As storage volume increases, businesses should consider de-duplication software and a tiered storage system to reduce space demands and eliminate the need to purchase extra storage prematurely. Consolidation also simplifies enforcement of records management policies and practices, reducing liability exposure.

5. Prolong the Life of Existing Equipment

Heat generated by IT hardware, especially when it is located in small spaces, can reduce the life span of the equipment. Use blanking panels in server racks and air locking grommets in raised-floor panels to minimize cold air loss. Keeping the cold air in place will not only keep the equipment healthier and extend its life span, but it also reduces cooling costs.

6. Enable Remote Access

Small businesses sometimes focus so much on protecting and backing up data, that they don't take steps to ensure their employees can access that data remotely if the office is closed due to a disaster. Remote-access software provides employees with access to networked server or desktop information outside of the office.

These simple tips will help you make informed decisions when purchasing, implementing and maintaining small business storage solutions. Building relationships with vendors, exploring your storage options and properly caring for existing equipment will help you manage the increased amount of data, contain costs and protect your business against lost or damaged information -- now and in the future.

By Zachary Ferdinand