1819691 Members
3140 Online
109605 Solutions
New Discussion юеВ

Disk Scrubbing

 
Kenneth Platz
Esteemed Contributor

Disk Scrubbing

Hello everyone,

I've got a fairly involved question here that involves disk scrubbing. We are going to be performing a DR test in the next few weeks, and one of the things we are required to do at the conclusion of the test is to scrub all data from the disks.

I know what you're thinking, just dd if=/dev/zero of= and be done with it. However, it's not that simple. Our corporate security doesn't feel that that process is sufficient. Also, we have at least 10 terabytes of disk to scrub, and time is a factor. We do have multiple systems that we can scrub from, so that is a plus.

Now, we have an approved third-party solution that performs the following passes on a disk file:

Writes all zeroes
Writes a random sequence of data
Writes an infinite string of 0xff
Writes a random sequence of data
Writes an infinite string of 0x55
Writes a random sequence of data

(Our security is pretty paranoid about being able to recover this data).

Our disks are presented to us in 70GB LUNs from a disk array (nominally EMC Symmetrix, but that doesn't really matter to us).

Now, in the past (when our disk requirements were less than half of what they are now), we have basically put all our disks into a couple of volume groups, created striped filesystems on those volume groups, created a bunch of 2-gigabyte files on those filesystems, and then running our scrubbing program on each of those files, running a number of those scrubbing programs in parallel in order to (attempt to) maximize throughput. We use 2GB files, because we're not sure if the application supports >2GB files or not.

Now, I've given this a fair bit of thought, trying to think how we can speed this up, since time is a serious factor here. Here are a number of possibilities I've considered:

1) Run the third-party program, but write directly to the block devices for the LUNs.

2) Same as #2, but write to the raw devices.

3) Write my own PERL or C program where I perform the above steps, but write in parallel to the block or character devices, as fast as I can send the data to them.

Now, I'm leaning towards #3, but I've never worked with writing directly to block/character devices before - are they pretty much just like write()'ing to a standard disk file? And how do you detect that you've hit the end of the disk? (Do you get an EOF type error or something?) My C is kinda rusty, but it's a pretty straightforward bit of code to write.

Any input to my thought process would be appreciated.
I think, therefore I am... I think!
6 REPLIES 6
James R. Ferguson
Acclaimed Contributor

Re: Disk Scrubbing

Hi Kenneth:

It's interesting that your security people are as paranoid as you say in a disaster recovery center. Having done quite a few of these, I've come to conclude that the paranoia is out-of-hand in this case.

That aside, if you want to use part of your costly test time erasing media, I'd simply use 'dd' with '/dev/zero' and '/dev/urandom' as input and make your own n-passes.

Use the raw disk device files so that you bypass the Unix buffer cache and use a large blocksize:

# dd if=/dev/zero of=/dev/rdsk/cXtYdZ bs=1024k

# dd if=/dev/urandom of=/dev/rdsk/cXtYdZ bs=1024k

When the end of the device is reached, 'dd' will simply stop. Use as many iterations as you see fit. All of the above scripts very simply in a shell script.

Regards!

...JRF...
Bill Hassell
Honored Contributor

Re: Disk Scrubbing

There is a much simpler solution when you are using a large disk array like the EMC Symmetrix or similar. When you are ready to scramble your data, you simply start swapping disks around in the cabinet. The striping and metavolumes will be completely unreadable and the Symm will have to be reinitialized. Or you can load a new bin file and that will also scramble the bits. NOTE - NOTE - NOTE: If the Symm is setup with BCV's or snaps, you will not see these without Symm/SAN mapping changes. You will leave valuable data untouched if you don't destroy their data too.

Your corporate security has been watching too many episodes of NCIS or 24. Modern disk arrays are so complex that scrambling LUNs and striping is often done by accident. As far as overwritng 5 or 10 times, the equipment to dig into the remnants of overwritten costs millions of dollars and doesn't fit in a laptop. And as for stealing the Symm, a fully loaded 8830 is several thousand pounds.

The dd+urandom solution is the best, but be sure you keep all the paths busy so the task will complete in less than a month. Always use bs=1024k. Otherwise, a 5 TB array may require 6 months to erase.


Bill Hassell, sysadmin
Dennis Handly
Acclaimed Contributor

Re: Disk Scrubbing

>Bill: Your corporate security has been watching too many episodes of NCIS or 24.

I thought if they watched those and about NSA, they would know you have to physically destroy the disks. :-)
Armin Kunaschik
Esteemed Contributor

Re: Disk Scrubbing

I read in an interview that dd from /dev/zero is enough to keep average to advanced users from recovering your data. Only expensive recovery comanies are (in some cases) able to recover overwritten data. So it depends on your security needs. If they are really high I would physically destroy the disks.
In all other cases simply overwriting them (with any pattern) should to the job.

If you care about performance use dd from /dev/zero.
/dev/urandom is quite slow when it has to generate that much random data.

My 2 cents,
Armin

PS: Please assign points if you find answers useful!
And now for something completely different...
John Guster
Trusted Contributor

Re: Disk Scrubbing

how about mediainit command?
Mark Sellan
Advisor

Re: Disk Scrubbing

We're about to scrub a much smaller array but need the same thoroughness of scrubbing.

We're going to scramble the disks but before that we're planning to run a utility we found (rather than write our own) on SourceForge called Diskscrub which supposedly can scrub 5TB in a week.

It has been ported to HPUX and according to its docs can scrub to NNSA Policy Letter NAP-14.x and DoD 5220.22-M minimum levels.

Has anyone used this tool?

-mark