StoreOnce Backup Storage
1755772 Members
2759 Online
108838 Solutions
New Discussion юеВ

Storeonce 5100 CIFS & Catalyst Replication Issues

 
SOLVED
Go to solution
andy91717
Occasional Advisor

Storeonce 5100 CIFS & Catalyst Replication Issues

Hi,

Last week a group of our cross site Storeonce 5100's stopped replicating data, with CIFS or Catalyst not working. We have the following setup::

  • Site A = 2 x 5100's
  • Site B = 1 x 5100
  • So we have CIFS & Catalyst (via Data Protector) replication from two D2D's in Site A to one D2D in Site B
  • Site B D2D is twice the size of Site A
  • Replication has been working fine for over 4 years!

We came in last Wednesday and noticed the daytime Catalyst backups via Data Protector were not working, and also the previous nights CIFS replications had not completed either. We currently have 17 total D2D's across the two sites, running either a 2 in to 1 replication, or 1 to 1 replication between sites. It's only these problem 3 which can't replicate.

Data Protector error for replication was:

  • Replication error: Changed whilst queued or paused

CIFS D2D replication error:

  • Replication protocol timeout whilst sending or receiving a message
  • Pausing a running file data job
  • Resuming a replication of file data

These three messages repeat themselves over and over again ifinitum since last week, across the 3 CIFS shares we have.

We've done testing sending Catalyst data only (easier to test Catalyst than CIFS!) from other D2D's in Site A to the Site B problem one, and that works fine. We have sent from the problem Site A ones to another Site B one....and it fails. So it looks as if the problem lies with the two D2D's in Site A, as Site B D2D will happily accept replication data from other Site A D2D's. 

Our networks team have been looking at the 10Gb ports in Site A looking for anomolies, and can't find any issues. We are stumped as to what has suddenly caused the issues on the two Site A D2D's to stop them replicating any data to Site B. 

Has anyone got any ideas as to what could be the issue? Any help or advice would be gratefully received.

Thanks,

Andy

5 REPLIES 5
chip_73
HPE Pro

Re: Storeonce 5100 CIFS & Catalyst Replication Issues

Andy,

 

What is the StoreOnce code you are running?

It is hard to understand the reason for the replication to stop working without a full support ticket but the first thing I would try is to reboot the StoreOnce system that has issues.

You can start with one of the source StoreOnce systems and see if that will make any difference.

 

Chip

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
andy91717
Occasional Advisor

Re: Storeonce 5100 CIFS & Catalyst Replication Issues

Hi Chip,

Thanks for responding. To confirm all three D2D's are on 3.18.20, have been since we upgraded earlier this year. 

As for a reboot, yep, when we noticed the issues last Wednesday we did reboot all three of the D2D's in question hoping it would resolve the issue, but no joy. As for sending up logs, this is where it always gets tricky for us, I work on a secure account whereby we can't send logs to any vendors. We have to download them locally and look for errors ourselves, with advice from vendors of where to look. Don't suppose you would have any recommendations of logs to check from one of the D2D's to look for reasons why the replication is failing?

Thanks,

Andy

chip_73
HPE Pro

Re: Storeonce 5100 CIFS & Catalyst Replication Issues

Andy,

I usually suggest to have a case open with the StoreOnce support but without logs this will be a hard task anyway, still maybe worth it if you have a current support contract.

From what I understand the replication is broken completely only between 2 specific StoreOnce systems on SiteA and any targets on site B. All the other replication work fine from other StoreOnce systems on site A to the same target on site B. If that is the case and the two SO with issues are in the same network as the ones that are working fine, that will exclude a network issue (I am thinking at the ports 9387 and 9388 that must be open both ways in firewall for the replication to work).

So the focus will be on the two systems with issues.

First of all, is the Replication service on the two systems running? What is the status of the Partner appliances under the Replication tab?

You can also check the replication event history under the same tab. Any specific errors might be helpful.

Regarding the logs: you can collect and download a full support ticket, then unzip it.

Some of the logs that are relevant for replication are:

- <node>/d2d/ssid1/replication. Here you can check the event (RepEvent.txt) and status (RepStatus.txt) 

Since you also have Catalyst copy having issues, other logs will be <node>/d2d/ssid1/catalyst: ReplicationObjectStore.log, ReplicationObjectStoreKey.log, 

 

Chip

I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
andy91717
Occasional Advisor
Solution

Re: Storeonce 5100 CIFS & Catalyst Replication Issues

Hi all,

Thanks for the help and advice, we have no resolved this issue, and believe it or not the issue was one SFP! To confirm the replication partners were online just fine, no errors in the partner status, the errors were only that the replication would process a tiny bit of CIFS data replication every few mins, and go back to nothing, but Catalyst replication driven from Data Protector would not process at all.

So on the single D2D in Site B we did some more testing, this time disabling the two 10Gb network ports one at a time, initially from the network switch end. We dropped the first of two ports, and hey presto the replication kicked in to life straight away. We brought back up the first port and disabled the second, and the replication stopped, the same as when both were online

It was just the one 10Gb port which was having problems, and when joined in a team with the other 10Gb port, it ground the whole replication to a halt. We left the one port up which was working to get data replicating again between sites, whilst we planned for fixes to the dodgy connection. 

We came up with a plan to change SFPs, cables, at the D2D end until we fixed the problem. The first SFP we replaced in the back of the D2D fixed the problem, as soon as it was replaced the replication kicked straight back in at normal speeds, teamed with it's partner connection.

So in the end it was simply one flapping SFP which stopped a teamed connection from processing any data. Simple really now we look back at it, we should have fixed this sooner!

Hope this helps someone else in future.

Cheers, Andy

Sunitha_Mod
Moderator

Re: Storeonce 5100 CIFS & Catalyst Replication Issues

Hello Andy,

That's excellent! 

We are extremely glad to know you were able to find the solution and we appreciate you for keeping us posted. 

Thanks,
Sunitha G
I'm an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo