Thin provisioning - run out of space

ewald · ‎10-27-2009

What can happen if "run out of space" in the case of thin provisioning?

In the manual i find "write failure"

Uwe Zessin · ‎10-27-2009

Well, I assume that all writes (at least to unallocated areas of a volume) will be rejected - that makes the volume unusable.

Please do not turn on thin provisioning just because the marketing message sounds great!

This works only good with predictable data growth and good planning / monitoring.

.

teledata · ‎10-28-2009

Database writes will fail resulting in application errors.

If you run out of space on a VMFS volume, your virtual machine will actually crash.

Let me stress and underline the previous posters' warning.

YES, thin provisioning is a great way to better utilize storage, but PLEASE implement a robust, tested, and adhere to a monitoring/reporting/alerting methodology that ensures you maintain enough space.

Also it would be important that you have enough advance warning/planning to be able to budget, get approval, aqcuire and implement additional SAN modules BEFORE the volumes run out of space ;)

I carry with me a linux Nagios monitoring appliance that I always implement for customers if they don't already have an enterprise monitoring solution in place.

http://www.tdonline.com

Gauche · ‎10-29-2009

Those are two of the most dire opinions about thin provisions I've heard. Not wrong, but definitely the negative side of the story.

Some of the importing things they did not point out I'd like to add.

-Thin provisioning on a P4000 SAN is like a light switch, turn it off and on all you want, volume by volume. So no need to dread making this decision, you can change your mind anytime.

-Thin provisioning works on any app. As far as space savings go the worst case scenario with thin provisioning is you use up just as much space as if you had just made it full provisioning. That is really rare too.

-Thin provisioning exposes no risk at all unless you overprovision the SAN. The P4000 utilization graph actually changes if you do overprovision and points this out. Until you have allocated and exposed more storage then exists in the SAN you have no exposure to the possibility of running out of space for new writes.

-You'll get alerts as you approach the capacity limit of the SAN. You'll actually get many as snapshot schedules will alert that they could not take the next snapshot as you get close enough to the end because there is not enough space. If you are not paying any attention to alerts or warnings in the UI then you are exposing yourself to more issues, and more common ones, then just thin provisioning, you would not even know when disks fail.

-When you see these alerts you can do much more than just purchase more storage. The quick reactions to do before that are things like deleting some un-needed snapshots, lower the replication level of a volume, make a full volume thin (that frees up unused pages for everybody else), or delete a volume (assuming you really don't need it). Most customers seem to just delete some snapshots, and change their schedules to keep less copies around, until they can get more storage.

Because you did ask. A more complete description of what does happen would be this.
If you do thin provision volumes, and over allocate the SAN, and ignore all warnings and alerts, and don't free up space, and actually write the SAN down to having zero pages available to allocate, this will happen...
-All fully provisioned volumes will be unaffected (that's the point of fully provisioning them)
-Snapshot and replication schedules will not run because there is not space for additional snapshots. Also, every one of them will be throwing alerts, many of them usually do before the SAN is actually out of space too. It depends on the size of the snapshots.
-Thin provisioned volumes would remain online and reads would continue to succeed.
-SCSI writes to thin provisioned volumes would fail until pages are freed up using one of the aforementioned ways.

The exact affect at the server level of the SCSI writes failing depends on the app and OS.
Any app that bothers to zero out its data space would not be affected, even on thin provisioned LUNs. An example of this is a VMware VM that eager zeroed its disk, such as a fault tolerant VM.
An app or OS that did not zero out its data first would probably error, crash, blue screen, or if it's a VM it might power off.

Hope that helps.

Adam C, LeftHand Product Manger

Bryan McMullan · ‎11-02-2009

We've actually had this problem....multiple times. It's the benefit and curse of thin provisioning.

With thin Provisioning, "chunks" of data are allocated to each volume you create. If the cluster is maxed out and overallocated and then a volume uses up its most recent "chunk" of space, writes to that specific volume will fail. Other volumes that have not used up their most recent "chunk" will still function normally.

So the world does not end, but it's a pain to dig out of.

To fix the situation, you need to start deleting snapshots if you can. Be patient and delete the smallest snapshots first as the cluster will be performing at less than a desirable level. Eventually you will recover enough space that the volumes that are stuck will be able to allocate another "chunk". In extreme cases, I've switched a volume from 2 way replication to no replication to get space (we grew 3 TB in 3 days...it was a bad time).

If you have no snapshots to delete, or cannot set a less data intensive replication mode....you just might be out of luck until you can get another unit. I've never been in that bad luck.

bucketenator · ‎01-17-2010

re: alerts for utilisation - is there a way to specify different thresholds other than the default 90% WARNING / 95% CRITICAL defaults? I'd like to drop these to ~ 80%/85% to get more advance notice on out of space conditions.

Thanks,

JD

teledata · ‎01-18-2010

I've written my own SNMP checks via NAGIOS, that allow notification at user defined warning, and critical thresholds.

I'd be happy to post my nagios check command for cluster utilization if you'd like.

http://www.tdonline.com

Chris House · ‎02-18-2011

I'd be interested in your work with Nagios/SNMP and XP ThP monitoring.

teledata · ‎02-21-2011

I've written a few installments of using Nagios (particularly Groundwork OpenSource) with SNMP checks for HP P4000/LeftHand

http://www.tdonline.com/training/lefthand/

The latest command for doing a cluster storage utilization (check_lhc) is found here:
http://www.tdonline.com/training/lefthand/scripts/

You will want to use the HP MIB files that are found if you do a "complete" install of the management console. They will be here:
C:\Program Files (x86)\HP\P4000\UI\mibs

They need to be installed into your Nagios MIB folder. (for groundwork it is: /usr/local/groundwork/common/share/snmp/mibs)

http://www.tdonline.com

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Thin provisioning - run out of space

Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space

Re: Thin provisioning - run out of space