<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AkillesBlog &#187; gentoo</title>
	<atom:link href="http://blog.akilles.org/tag/gentoo/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.akilles.org</link>
	<description>Talk on programming, computers, electronics, web etc</description>
	<lastBuildDate>Sun, 23 Jan 2011 23:46:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Gentoo md (Software RAID) RAID5 disk crash</title>
		<link>http://blog.akilles.org/2008/05/13/exciting-days-with-md-raid5-disk-crash/</link>
		<comments>http://blog.akilles.org/2008/05/13/exciting-days-with-md-raid5-disk-crash/#comments</comments>
		<pubDate>Mon, 12 May 2008 23:55:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[crash]]></category>
		<category><![CDATA[gentoo]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[md]]></category>
		<category><![CDATA[RAID]]></category>

		<guid isPermaLink="false">http://blog.akilles.org/?p=64</guid>
		<description><![CDATA[How to recreate and assemble a Gentoo md software RAID (RAID5) array, where two disks have bad superblocks and one of these are permanently faulty.]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s spring, so the weather outside is outstanding. The days are longer, the birds are singing and people enjoy the nature wakening up for a new summer of growth.<br />
Having my exams in near future, I&#8217;m content that I can&#8217;t use a great deal of my time enjoying the spring yet for another few weeks. But at least I&#8217;ve set aside the time I need for preparing my exams&#8230;<br />
&#8230;, I thought. Then, in the middle of my mail inbox I&#8217;ve received a mail from Charlie Root at my Gentoo server. Hardly a request for a bicycle ride in the sunshine:<br />
<code>WARNING: Some disks in your RAID arrays seem to have failed!</code> is the message.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-0426298800349415";
google_ad_slot = "0902617621";
google_ad_width = 468;
google_ad_height = 60;
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
</p>
<p><i>Damn,</i> the third year in a row(!) something erroneous happens to my server in the middle of my exam preparations. Why can&#8217;t it happen during fall, when the weather&#8217;s bad and I have the time for such challenges.<br />
Well, I suppose it&#8217;s <em><a href="http://en.wikipedia.org/wiki/Murphy's_law">Murphy&#8217;s law</a></em>. However, over to a more precise description of the problem&#8230;</p>
<h2>The problem</h2>
<p>I have 6 SATA disks in a RAID5 software array (through the use of Gentoo&#8217;s md), on which an XFS filesystem is mounted.</p>
<p>The actual error is that two(!) of my disks became faulty overnight. Diagnostics follow.</p>
<p><em>/proc/mdstat</em>:<br />
<code>md0 : active raid5 sdf1[5] sde1[6](F) sdd1[3] sdc1[2] sdb1[1] sda1[7](F)<br />
      (some number) blocks level 5, 64k chunk, algorithm 2 [6/4] [_UUU_U]</code><br />
I really got nervous here; 2 faulty disks in a RAID5 array means trouble. But why on earth did two disks fail at one time??  I&#8217;m not sure yet.</p>
<p><em>dmesg</em> excerpt:<br />
<code>md: Autodetecting RAID arrays.<br />
md: autorun ...<br />
md: considering sdf1 ...<br />
md:  adding sdf1 ...<br />
md:  adding sde1 ...<br />
md:  adding sdd1 ...<br />
md:  adding sdc1 ...<br />
md:  adding sdb1 ...<br />
md:  adding sda1 ...<br />
md: created md0<br />
md: bind<sda1><br />
md: bind<sdb1><br />
md: bind<sdc1><br />
md: bind<sdd1><br />
md: bind<sde1><br />
md: bind<sdf1><br />
md: running: <sdf1><sde1><sdd1><sdc1><sdb1><sda1><br />
md: kicking non-fresh sde1 from array!<br />
md: unbind<sde1><br />
md: export_rdev(sde1)<br />
md: kicking non-fresh sda1 from array!<br />
md: unbind<sda1><br />
md: export_rdev(sda1)<br />
raid5: device sdf1 operational as raid disk 5<br />
raid5: device sdd1 operational as raid disk 3<br />
raid5: device sdc1 operational as raid disk 2<br />
raid5: device sdb1 operational as raid disk 1<br />
raid5: not enough operational devices for md0 (2/6 failed)<br />
RAID5 conf printout:<br />
 --- rd:6 wd:4<br />
 disk 1, o:1, dev:sdb1<br />
 disk 2, o:1, dev:sdc1<br />
 disk 3, o:1, dev:sdd1<br />
 disk 5, o:1, dev:sdf1<br />
raid5: failed to run raid set md0<br />
md: pers->run() failed ...<br />
md: do_md_run() returned -5<br />
md: md0 stopped.<br />
md: unbind<sdf1><br />
md: export_rdev(sdf1)<br />
md: unbind<sdd1><br />
md: export_rdev(sdd1)<br />
md: unbind<sdc1><br />
md: export_rdev(sdc1)<br />
md: unbind<sdb1><br />
md: export_rdev(sdb1)</code></p>
<p>It seems like sda is the only faulty one (<i>mdadm &#8211;examine</i> says it doesn&#8217;t have a superblock at all), but the crash of this one must have messed up the superblock of sde. However, it (the superblock of sde) was recreatable, and thereby also the RAID5-array as a whole: RAID5 allows for one disk (in this case, sda) to fail, but then you have no extra parachute, until you replace that faulty one.</p>
<p><script type="text/javascript"><!--
google_ad_client = "pub-0426298800349415";
google_ad_slot = "0902617621";
google_ad_width = 468;
google_ad_height = 60;
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
</p>
<h2>The (temporary) solution</h2>
<p>The superblock information for the different disks in the RAID-array were extracted for each disk with<br />
<code>mdadm --examine /dev/sda1</code><br />
and so on. I noted (that is, copy-paste) the exact information for all disks, for (at least) using as option input to the <i>mdadm &#8211;create</i>-command in the next paragraph.</p>
<p>I ran (<b><em>Warning:</em></b> do this at your <b>own risk</b>! It&#8217;s incredibly important setting the right options here, so be sure you&#8217;ve read and understand the contents of <i>man mdadm</i> first! Failing to do so will result in loss of data!):<br />
<code>mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 missing /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1</code>, which put the array back up in degraded mode, with the erroneous disk sda set to missing. This was sufficient to be able to mount the array, but not without problems.</p>
<h2>Mounting the degraded array</h2>
<p>A first attempt on &#8220;mount /dev/md0 /mnt&#8221; gave the error &#8220;mount: Structure needs cleaning&#8221;. This is the XFS filesystem telling it&#8217;s not entirely consistent. I could possibly run xfs_repair (I did, in a pretend-type-mode, with the -n option), but I&#8217;m not willing to risk my data on this yet. Instead, I did get the device mounted with this command:<br />
<code>mount -r -o norecovery /dev/md0  /mnt</code></p>
<p>In this way, I can now access my data and make a backup of them. Some data is probably corrupted without repairing the XFS file system, but hopefully most of it is recoverable&#8230;</p>
<h2>Some of the pages I visited in my frustration</h2>
<ul>
<li><a href="http://www.issociate.de/board/post/461227/kicking_non-fresh_member_from_array?.html">Kicking non-fresh member from array</a></li>
<li><a href="http://www.issociate.de/board/post/413817/Linux_Software_RAID_Bitmap_Question.html">Linux RAID bitmap question</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.akilles.org/2008/05/13/exciting-days-with-md-raid5-disk-crash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

