Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

How data is read\written and MPIO questions

SOLVED
Go to solution
HPSupportHelp
Occasional Advisor

How data is read\written and MPIO questions

We have a multisite SAN consisting of 8 nodes - four nodes at each site. I want to get a better understanding of how a server writes\reads the data from the storage nodes.  Having read through the manuals, I’m still a little unsure.

 

Windows Servers 2008R2

 

  1. When data is written to a volume configured as RAID 10 or 10+2, does the server write blocks directly to each storage node, or is the data written to one storage node which then copies the blocks to the other node(s) in the cluster?

     

  2. When a server connects to the VIP, does this VIP node then designate another node to act as a gateway connection to handle all the IO for the specific volume?

http://frankdenneman.nl/2009/10/11/lefthand-san-lessons-learned/

 

When the HP DSM for MPIO is used, the DSM for MPIO guide shows the server connecting to all the nodes in the cluster.  This doesn’t seem to fit with the gateway connection explanation.  In fact, the StoreVirtual DSM deployment guide mentions nothing about the gateway connection.

 

  1. If I’ve set the site location for severs correctly in the CMC, then each server should retrieve data from storage nodes located on the same site.  How does this work, especially when servers located at different sites share the same volume – a gateway connection for each server at each site maybe?

     

  2. Is it better to use the Microsoft DSM or the HP DSM for MPIO?

    In the DSM for MPIO deployment guide, the diagram indicates only one node is used for IO when the Microsoft DSM is used. Wouldn’t this cause poor performance?

     

Thanks 

16 REPLIES
oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

assuming HP MPIO setup correctly on windows, then there will be a "DSM" connection to each node and one additional connection to one node.  Data read/written will go directly to the node that is supposed to get the data.  From there, if its written and NR10 it will be copied from that node to the other node before the write confimation is completed.  The advantage of the DSM is that it avoids the problem of trying to write data or read data from one node who doesn't have the data and would then have to request the data from a different node to provide the data.  The result is that the HP DSM can provide the quickest and most IO for the SAN, but it does result in the greatest number of iSCSI connections so at a point you can run into an iSCSI connection limit or your switches can get overloaded because of the increased sessions.  

 

One additional benefit to the HP DSM is that for multi-site SANs where you have the targets and initiators site specified will cause the initiator to only connect to local storage nodes over remote nodes as long as local nodes are available.  This provides a similar benefit as described above, but the results are more dramatic because of the increase in latency and bandwidth limitations of a typical multi-site SAN.  As it would be verry inefficient if your SAN wasn't site aware and attempted to write data to a remote node first which would mean the latency would essientially be 2x the RTT from the server to the remote node + the RTT of the node to node connection as opposed to 2x the RTT of from the server to the local node + the RTT between the local and remote node.

a_o
Valued Contributor

Re: How data is read\written and MPIO questions

As said, HP DSM  gets better performance, especially in a multi-site SAN.
Essentially, it boils down to the fact that MS DSM does not really know about the architectural nuances of StoreVirtual. So it handles it like any other iSCSI target that supports MPIO.
Whereas, HP DSM is specifically written to support StoreVirtual. The only downside is that as your number of  targets and initiators increase, HP DSM might not scale as well. This is because the stress on the network increases due to the near exponential increase in the amount of iSCSI connections.

HPSupportHelp
Occasional Advisor

Re: How data is read\written and MPIO questions

Great! thank you for the reply.

 

So just to clarify this:

 

assuming HP MPIO setup correctly on windows, then there will be a "DSM" connection to each node and one additional connection to one node.

 

What is this additional connection and why is it only to one node?

 

Data read/written will go directly to the node that is supposed to get the data.  From there, if its written and NR10 it will be copied from that node to the other node before the write confirmation is completed.

 

Based on this explanation this still leads me to think that one node acts as a gateway to handle all IO, and passes read\write requests to the other nodes (if NR10 is used). Is this correct?

 

The advantage of the DSM is that it avoids the problem of trying to write data or read data from one node who doesn't have the data and would then have to request the data from a different node to provide the data.

 

Based on you first example, I’m still a little unsure of your next explanation.

 

As an example and using HP DSM:

 

If I have four nodes and network raid 10 is used:

 

Write:  Block A gets written directly to node1 then copied from here to node 2.  Then, block B is written to directly node 2 and copied from here to node 3 etc etc.

 

Read:  If block A is requested, it’s read directly from node1, and if block B is requested, it’s read directly from node 2 etc etc.

 

If the Microsoft DSM is used, do all read\write requests filter through one node?

 

 

a_o
Valued Contributor
a_o
Valued Contributor

Re: How data is read\written and MPIO questions

To expand on the above.
HP DSM has knowledge of the data as it's mapped to the storage system's components - NR10 vs NR5. This is where the administrative connection comes in.
Reads are always done from the one node that has the required data.
Writes are also done to one node, and then replicated to the other participant nodes depending on the network raid level.
oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

the additional connection is an "administrative" connection.  It is the first connection made to the SAN cluster housing the LUN and it doesn't carry any data.  I don't know exactly what traffic goes over it, but you will have data availability as long as any single iSCSI connection to the LUN is present.  The more you have active, the more the HP DSM can ballance its usage.  

 

Side note, you should switch the load ballancing from "Vendor Specific" to "Round Robbin".  This is in the HP DSM guide.

 

 

As for your other question.  If you have a 4-node cluster, your data is mirrored on two nodes and striped on the other two.   Example: 

Node1 - Data AB

Node2 - Data BC

Node3 - Data CD

Node4 - Data AD

 

In this situation, if you are writing Data A, you should optimally write the data to either node1 or node4.  If you send the write request to node2, node2 will accept the data write and then have to send that to both correct nodes which takes extra time.  Likewise on read requests, if you want data C, you would be best to ask Node2 or Node3 for the data as if you ask Node1 or Node4, those nodes could provide the information, but only after requesting it from the correct nodes themselves.  In addition, when you know where all data should be located, your read performance can go up dramatically since you can request different data residing on each node all at the same time so you can get data A from node1, B from node2, C from Node3 and D from node4 much more quickly than if you sent node1 a request for data A,B,C,D.

 

I forget how the MS DSM works as I don't use it, but I think it doesn't go through a single gateway for all requests, but it isn't optimized to request the IO from the correct node that will actually do the IO.

 

Read up on the documentations.  As long as you can use the HP DSM, you really should.  The only real issue is if you have many nodes and many LUNs you can run into iSCSI connection limits, but I don't think this becomes a real concern until you hit 10+ nodes.

HPSupportHelp
Occasional Advisor

Re: How data is read\written and MPIO questions

Thanks again for the response and helpful explanation.

 

In this situation, if you are writing Data A, you should optimally write the data to either node1 or node4.  If you send the write request to node2, node2 will accept the data write and then have to send that to both correct nodes which takes extra time.  Likewise on read requests, if you want data C, you would be best to ask Node2 or Node3 for the data as if you ask Node1 or Node4, those nodes could provide the information, but only after requesting it from the correct nodes themselves.  In addition, when you know where all data should be located, your read performance can go up dramatically since you can request different data residing on each node all at the same time so you can get data A from node1, B from node2, C from Node3 and D from node4 much more quickly than if you sent node1 a request for data A,B,C,D.

 

So in short, and sorry to sound like a broken record: The write request is written to one node first, which then replicates the data to the other nodes. The next write is written to a different node which again carries out the replication. Data can be read simultaneously from whatever node that has the data.  

Would the node performing the replication at the time be considered the gateway connection?

http://frankdenneman.nl/2009/10/11/lefthand-san-lessons-learned/

 

Side note, you should switch the load ballancing from "Vendor Specific" to "Round Robbin".  This is in the HP DSM guide.

 

I have looked through this guide and this urged me to ask these questions in the first place.

It does mention that Vendor Specific is selected by default, but doesn’t say to explicitly change this to round robin.  Currently we have left it as the default setting of vendor specific, which I understand to be failover only.  Would you recommend switching this to round robin?

a_o
Valued Contributor
Solution

Re: How data is read\written and MPIO questions

Let's keep it simple.

WRT the gateway,  HP DSM makes it a moot point.
With HP DSM, the initiator is always  connected to all of nodes servicing the LUN.

So, effectively there's no gateway per-se.

HP DSM is aware that there are multiple nodes making up a LUN in StoreVirtual.

OTOH, a generic DSM is not aware of this, and needs a single IP (gateway) - in this case the VIP - to make requests to. This gateway then sends the data back to the initiator (reads) or collects the data (writes) and forwards it to the appropriate node. (The VIP is 'spoofed ' to the MAC address of one of the participant nodes. i.e. it's running on one of the nodes.)

 

HP DSM cuts to the chase as it were, and reads the data from the appropriate node(s). It also sends the data to be  written  directly to the appropriate node(s), as it knows which node has the data block to be read and which node a modified or new data block needs to be written to.

Every participant node in a LUN has the complete map of all the blocks in a LUN. So, HP DSM only needs to connect administratively to one of them in order to know where each data block belongs.

The other connections are just doing pure IO.

 

"Vendor Specific" is the same  FailOver Only MPIO.

This default option offers only  failover w/MPIO. i.e. one of the IO paths will be active at one time. The other is passive.

Round Robin gives you better performance, as it's effectively doing true load balancing  w/MPIO - i.e. active/active paths to your LUNs.

 

 

 

oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

++ what a_o said.  Good summary.

 

I do round robbin for all my LUNs.  I would only go failover if/when you get close to the iSCSI connection limits or start having switch problems.

 

 

I think you mentioned multi-site SANs...  the DSM will not connect to the remote site nodes unless there is not a local node that contains the data.

Sbrown
Valued Contributor

Re: How data is read\written and MPIO questions

So basically ESXi is severely handicapped when using more than 2 nodes (3 with NR5) since it would have to guess which nodes to write/read to ?

 

Is there any way to maintain cluster-wide cache coherency ?

oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

I don't know if I would go so far to say severely, but yes, the HP DSM is more efficient.  

 

Likewise there is little/no advantage to cache coherency since the read hit rate is likely very low on large datasets.

 

 

Avoid NR5 for anything but static archive data.

HPSupportHelp
Occasional Advisor

Re: How data is read\written and MPIO questions

Late reply, been away for a bit.

 

Excellent description.

 

WRT the gateway,  HP DSM makes it a moot point.

 

So when I look in the CMC at the sessions tab of the cluster,  what exactly is this gateway connection column telling me?  Is this the management connection?

 

oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

that is telling you what node that specific iSCSI connection is connected to.  When you have the HP DSM setup correctly you will see a connection from each server NIC to one node IP (thats the initial iSCSI connection and only used to initially establish the other DSM connections) and then you will see a bunch of connections saying something like "VSA_NAME (X.X.X.X),DSM".  The number of those connections would be the number of server NICs you setup with MPIO connections times the number of Nodes in the SAN Cluster, so if you have two server NICs and three nodes, you will have 6 DSM connections and two connections that don't show as "DSM"

HPSupportHelp
Occasional Advisor

Re: How data is read\written and MPIO questions

So does this connection come after a host has made the initial connection do the storage node acting as the VIP?

oikjn
Honored Contributor

Re: How data is read\written and MPIO questions

if you are actively watching the iSCSI connection tab when you setup a new target, you will see the initial iscsi connection that doesn't show a label "DSM" and then in a couple seconds you will see additional iscsi connections generated marked as DSM.  This assumes you use the HP DSM and its setup as the guide says and you change the MPIO setting to round robin (from Vendor Specific).

HPSupportHelp
Occasional Advisor

Re: How data is read\written and MPIO questions

Thanks to oikjn and a_o for your help in answering my queries.  Very helpful indeed.

 

Thanks