Failed node on 8200

Post Reply
mulae
Posts: 6
Joined: Mon Mar 04, 2019 8:57 am

Failed node on 8200

Post by mulae »

Hi All,

Maybe someone could help me out reasoning this out. We have a node failure on one of our 8200's. The node failed more than a week ago (still waiting for a replacement from HPE :( ).

While the node is failed noticing that :

InSplore and graphs cannot be generated.

Also noticing space issues for instance we have allocated a 13TB space from storage (dedup R6 volume) to our VMware cluster (VMFS 6 so unmap should run automatically). From 13Tb we are consuming around 7TB.

From storage side we are seeing capacity allocated increasing with compaction ratio going down and dedup back to 1:1.
Free space on this storage is now at 3% constantly decreasing.

HPE are saying that this is normal growth however from our end from Vmware side we are still seeing just 7Tb being utilised. So how come storage space on the 3PAR is increasing constantly?


--Estimated(MB)---
RawFree UsableFree
374400 249600
MTB-3PAR-M cli%

Just wondering if anyone has experienced a behaviour similar to ours and more importantly if we are risking that the storage goes out of space.

Thank you
Attachments
3par.PNG
3par.PNG (17.48 KiB) Viewed 14076 times
MammaGutt
Posts: 1578
Joined: Mon Sep 21, 2015 2:11 pm
Location: Europe

Re: Failed node on 8200

Post by MammaGutt »

I think HPE is forgetting something very important.

Garbage Collection for dedupe is running on all nodes in the cluster that isn't master. When you have a 8200 (2node system) and one node fails, how many nodes do you have left for Garbage Collection?

If you run out of space with dedupe, you're pretty much screwed.... There is a way to manually remove and reduce spare chunklets so you get a little bit more space, but it seems to me like your failed node needs replacement ASAP to get GC back up and running.

What 3PAR OS version are you running and which version of dedupe (3.3.1 introduced dedupe v3 which is shown with "showcpg -d" in CLI under "shared version").
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
mulae
Posts: 6
Joined: Mon Mar 04, 2019 8:57 am

Re: Failed node on 8200

Post by mulae »

We have enquired about GC but all they reported is that system is stable. After finally almost 4 different agents we now got hold of one that identified immediately the issue and said that GC is running however very very slow approx 6GB every 30mins. i'm not sure if GC behaviour has changed over versions, saying this cos support guy gave us the below stats so it seems that GC still running but at a slower pace:
GC: Final Stats: Total space freed: 6 GB Total time taken: 1857 secs Dryrun: 0 Abandon run: 0

The dedup that is running is v2 and 3par running on 3.3.1 MU2.
MammaGutt
Posts: 1578
Joined: Mon Sep 21, 2015 2:11 pm
Location: Europe

Re: Failed node on 8200

Post by MammaGutt »

mulae wrote:We have enquired about GC but all they reported is that system is stable. After finally almost 4 different agents we now got hold of one that identified immediately the issue and said that GC is running however very very slow approx 6GB every 30mins. i'm not sure if GC behaviour has changed over versions, saying this cos support guy gave us the below stats so it seems that GC still running but at a slower pace:
GC: Final Stats: Total space freed: 6 GB Total time taken: 1857 secs Dryrun: 0 Abandon run: 0

The dedup that is running is v2 and 3par running on 3.3.1 MU2.


To me, uncontrolled growth is not a stable system.

Adding to this that you have TDVV2 doesn't make this a better scenario. If I were you I would push back on HPE on this... Uncontrolled growth due to bad GC seems to indicate that your DDS is growing. With TDVV2, DDS compaction/reduction is like waiting for the polar ice cap to melt, so even when the node as some point gets replaced it will most likely take a very long time until you've regained the "blocked" data.

Not knowing a lot about your environment, do you have capacity somewhere else that would allow you to "start from scratch" and get TDVV3? That should greatly reduce the impact of the issue I think you are seeing, but it would required all dedupe volumes to be deleted or converted for a new TDVV3 DDS to be created. Considering you have less that 2% free capacity (and probably even less today), you don't have the free space needed for converting.
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
User avatar
Namlehse
Posts: 30
Joined: Mon Jan 13, 2014 11:58 am
Location: Claremore, OK

Re: Failed node on 8200

Post by Namlehse »

Not relating to the question, but I had the same issue getting a replacement node for one of my 8400's recently. After a week of non-stop complaining, I found out no one was checking compatible part numbers.

Hopefully you have a replacement by now, but if not, that's why. Took less than an hour to get it after they discovered the "Oopse".

Took us four days to get a replacement from more or less across the street.
vSphere | Windows | Linux
2x 3Par 7400 | Brocade SAN
Post Reply