找回密码
 注册入学

QQ登录

只需一步,快速开始

查看: 714|回复: 0

Bulletproof Storage

[复制链接]
 楼主| 发表于 2013-5-22 09:58:46 | 显示全部楼层 |阅读模式
Bulletproof Storage
      Disk systems will repair themselves or can be left unrepaired for years.
      You can fly a two-engine plane with one engine, but how many passengers
      would want to be on it?
      That’s the idea behind “bulletproof storage,” a concept that IBM has been
      developing for two years and plans to begin unveiling incrementally over
      the next one to three years.
      IBM’s technology initiative deals with fault tolerance in every part of a
      storage system: disk, controller, network cards, power supplies and
      software. By building more-robust storage systems that can defer
      replacement of failed parts for up to three years because of redundant
      components, IBM believes it can also eliminate many human errors that
      happen when failing components are replaced.
      According to Stanley Zaffos, an analyst at Gartner Inc. the bulletproof
      storage concept still has another five to 10 years before it’s broadly
      embraced by users. But once it is, storage systems will require less
      maintenance and, therefore, cost less to maintain.
      “We know how to build very reliable code. We use appliances every day that
      have software built into them that work forever: your automobile, your
      calculator, the disk drive in your PC, your telephone,”Zaffos says.
      But IBM is looking to attack far more complex systems than telephones or
      calculators.
      Under its bulletproof initiative, IBM is addressing disk-sector failures
      that grow along with disk capacity. While disk capacities double every 12
      to 18 months, uncorrectable read/write error rates haven’t improved, nor
      has the probability of an uncorrectable error occurring on a disk read
      decreased. There are more sectors on today’s disks and, therefore, a
      greater chance of an uncorrectable error.
      The answer is to create self-healing capabilities for storage management
      software and more-robust RAID configurations.
      IBM says that in about a year it will release storage systems that can
      support three simultaneous disk-drive failures in a single array by
      introducing additional parity disks into RAID configurations, offering
      many times the resiliency of a RAID configuration with two parity disks.
      Today, standard systems allow for only two disk failures.
      But Zaffos argues that 80% of downtime today is caused by user error and
      software failures, not hardware failures. He says that the failures
      resulting from software are created by complexity and that there is an
      almost infinite number of failures that can occur in a complex system.
      IBM is addressing those code failures with a software project called
      N-Version Programming, where two pieces of code in the same application
      save data and then compare the data to ensure that there are no errors.
      In N-Version Programming, two copies of data are protected using different
      means. One copy might be protected by standard RAID-5 programming coded by
      Programmer A.
      The second copy is protected by a different algorithm coded by Programmer
      B. That way, if the first copy gets corrupted due to a particular bug in
      the program written by Programmer A, then the second copy can be used.
      The second copy may have its own bugs, but they will manifest in different
      ways at different times, and when they do, the first copy will be the one
      which is good and which you can then use. It’s kind of like having a
      second person check the work of a first person and keep fixing it whenever
      it finds mistakes.
      One way IBM plans to detect and correct corrupted data is to create
      more-resilient storage software with repairable data structures. The code
      checks that certain conditions, which are described in rules, are met. For
      example, in a file system with multiple files, the sum of the space taken
      by the files plus the free space in the system must be equal to the total
      available space. The code will check this property automatically at
      various times and use a procedure to repair and fix problems if the
      property isn’t met.
      In this case, the software isn’t checking the code to see that it’s
      functioning properly and isn’t checking data contents. If certain
      properties aren’t met, the software knows how to fix the data structures.
      But don’t expect to see fruit from N-Version Programming or checkable data
      structures for another two to three years.
        防弹存储
        磁盘系统自行修理或者几年不用修理。
        双引擎飞机能用一个引擎飞行,但有多少乘客愿意乘坐?
        “防弹存储”背后的想法就是这样一个概念,IBM已经研究了两年,并计划在今后一至三年中不断公布进展。
        IBM的此项技术首创是要在存储系统的方方面面:磁盘、控制器、网卡、电源和软件,实现容错。IBM相信,通过制造更健壮的、并由于有冗余部件从而能将故障部件的更换推迟两至三年的存储系统,能避免很多在更换故障部件时产生的人为错误。
        Gartner公司的分析师Stanley
      Zaffos称,防弹存储概念能为用户广为接受还需要5至10年的时间。但一旦得到认可,存储系统将需要更少的维护,因而需要更低的维护成本。
        Zaffos说:“我们知道如何编制非常可靠的程序。我们每天使用各种各样的装置:汽车、计算器、PC机中的磁盘机和电话,它们都内装了使其能永远工作的软件。”
        但IBM着眼于攻克比电话或计算器更复杂的系统。
        在此项技术首创中,IBM要解决随磁盘容量增加而增加的磁盘部分故障。磁盘容量每12至18个月就翻一番,但无法纠正的读/写错误率没有得到改进,而且发生在磁盘读时的无法纠正的错误概率也没有降低。今天的磁盘上有更多的扇区,因而出现无法纠正错误的机会就更多。
        这个问题的答案是提供存储管理软件的自修复能力以及更健壮的RAID(冗余磁盘阵列)配置。
        IBM称,约在一年的时间里,将公布通过在RAID配置中增加一个奇偶盘而能在单个阵列中支持三个磁盘同时发生故障的存储系统,这将比两个奇偶盘RAID配置的弹性高出了很多倍。今天,标准的系统只允许两个磁盘出现故障。
        但Zaffos认为,今天80%的宕机是由于用户的错误和软件故障,而不是硬件故障引起的。他说,软件带来的故障是因复杂性造成的,而在复杂系统中可能发生的故障几乎是不计其数的。
        IBM用一个叫N-Version
      Programming的软件项目来解决这些程序故障,其中同一应用软件中有两段程序保存数据,然后通过比较数据来确保没有错误。
        在N-Version Programming中,使用不同的方式保护数据的两个备份。一个备份可以用由程序员A编写的标准RAID-5编程保护。
        第二个备份由程序员B编写的不同算法进行保护。这样,如果第一个备份由于程序员A编写的程序中的特定错误而被破坏了,就可以使用第二个备份。
        第二个备份也可能有其自己的错误,但这些错误将以不用的方式、在不同的时间表现出来,当出现这些错误时,第一个备份将是好的,你可以使用。这好像是有第二个人来检查第一个人的工作,一发现错误就纠正。
        IBM计划用来检测和纠正被破坏数据的一个方法,就是用可修理的数据结构来生成更有弹性的存储软件。这种程序检查在规则中描述的某些条件是否得到满足。例如,在有多个文件的文件系统中,文件占用的空间与系统中未用的空间之和应该等于总的可用空间。上述程序在不同的时间自动检查此特性,并在此特性未能得到满足时启用程序进行修理并纠正此问题。
        此时,软件不是检查此程序,看看它是否正常运行,也不是检查数据内容。如果某些特性未能满足,软件知道如何来修正数据结构。
        但不要指望在今后两三年内就能见到N-Version Programming项目,即可检查数据结构的成果。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册入学

本版积分规则

联系我们|Archiver|小黑屋|手机版|滚动|柠檬大学 ( 京ICP备13050917号-2 )

GMT+8, 2025-8-20 21:36 , Processed in 0.035319 second(s), 15 queries .

Powered by Discuz! X3.5 Licensed

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表