| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <head> |
| <meta name="Keywords" content="MC/S, MC/S vs MPIO, multiple connections per session"> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <meta name="author" content="Daniel Fernandes"> |
| <meta name="Robots" content="index,follow"> |
| <link rel="stylesheet" href="images/Orange.css" type="text/css"> |
| <title>MC/S vs MPIO</title> |
| </head> |
| |
| <body> |
| <!-- wrap starts here --> |
| <div id="wrap"> |
| <div id="header"> |
| <div class="logoimg"></div><h1 id="logo"><span class="orange"></span></h1> |
| <h2 id=slogan>Generic SCSI Target Subsystem for Linux</h2> |
| </div> |
| |
| <div id="menu"> |
| <ul> |
| <li><a href="index.html">Home</a></li> |
| <li><a href="http://www.sourceforge.net/projects/scst">Main</a></li> |
| <li><a href="http://sourceforge.net/news/?group_id=110471">News</a></li> |
| <li><a href="targets.html">Drivers</a></li> |
| <li><a href="downloads.html">Downloads</a></li> |
| <li><a href="contributing.html">Contributing</a></li> |
| <li id="current"><a href="comparison.html">Comparison</a></li> |
| <li><a href="users.html">Users</a></li> |
| </ul> |
| </div> |
| |
| <!-- content-wrap starts here --> |
| <div id="content-wrap"> |
| <div id="sidebar"> |
| <h1>Comparison</h1> |
| <ul class="sidemenu"> |
| <li><a href="comparison.html">Features comparison</a></li> |
| <li><a href="scstvslio.html">SCST vs LIO/TCM</a></li> |
| <li><a href="scstvsstgt.html">SCST vs STGT</a></li> |
| <li><a href="mc_s.html">MC/S vs MPIO</a></li> |
| </ul> |
| </div> |
| |
| <div id="main"> |
| |
| <h1>MC/S vs MPIO</h1> |
| |
| <p>MC/S (Multiple Connections per Session) is a feature of iSCSI |
| protocol, which allows to combine several connections inside a single |
| session for performance and failover purposes. Let's consider what |
| practical value this feature has comparing with OS level multipath |
| (MPIO) and try to answer why none of Open Source OS'es neither still |
| support it, despite of many years since iSCSI protocol started |
| being actively used, nor going to implement it in the future.</p> |
| |
| <p>MC/S is done on the iSCSI level, while MPIO is done on the higher |
| level. Hence, all MPIO infrastructure is shared among all SCSI |
| transports, including Fibre Channel, SAS, etc. </p> |
| |
| <p>MC/S was designed at time, when most OS'es didn't have standard OS level |
| multipath. Instead, each vendor had its own implementation, which |
| created huge interoperability problems. So, one of the goals of MC/S was |
| to address this issue and standardize the multipath area in a single standard. But |
| nowadays almost all OS'es has OS level multipath implemented using |
| standard SCSI facilities, hence this purpose of MC/S isn't valid anymore.</p> |
| |
| <p>Usually it is claimed, than MC/S has the following 2 advantages over MPIO:</p> |
| |
| <ol> |
| <li><span>Faster failover recovery.</span></li> |
| |
| <li><span>Better performance.</span></li> |
| |
| </ol> |
| |
| <p>Let's look how realistic those claims are.</p> |
| |
| <h2>Failover recovery time</h2> |
| |
| <p>Let's consider a single target exporting a single device over 2 links.</p> |
| |
| <p>For MC/S failover recovery is quite simple: all outstanding SCSI |
| commands reassigned to another connection. No other actions are |
| necessary, because session (i.e. I_T Nexus) remains the same. |
| Consequently, all reservations and other SCSI states as well as other |
| initiators connected to the device remain unaffected.</p> |
| |
| <p>For MPIO failover recovery is much more complicated. This is because |
| it involves transfer of all outstanding commands and SCSI states from one |
| I_T Nexus to another. The first thing, which initiator will do for |
| that is to abort all outstanding commands on the faulted |
| I_T Nexus. There are 2 approaches for that: CLEAR TASK SET and LUN RESET |
| task management functions. </p> |
| |
| <p>CLEAR TASK SET function aborts all commands on the device. |
| Unfortunately, it has limitations: it isn't always supported by device |
| and having single task set shared over initiators isn't always |
| appropriate for application.</p> |
| |
| <p>LUN RESET function resets the device.</p> |
| |
| <p>Both CLEAR TASK SET and LUN RESET functions can somehow harm |
| other initiators, because all commands from all initiators, not only |
| from one doing the failover recovery, will be aborted. Additionally, LUN |
| RESET resets all SCSI settings for all connected initiators to the |
| initial state and, if device had reservation from any initiator, it will |
| be cleared. |
| |
| <p>But the harm is minimal:</p> |
| |
| <ul> |
| <li><span> With TAS bit set on Control Mode page, all the aborted commands will |
| be returned to all affected initiators with TASK ABORTED status, so they |
| can simply immediately retry them. For CLEAR TASK SET if TAS isn't set |
| all affected initiators will be notified by Unit Attention COMMANDS |
| CLEARED BY ANOTHER INITIATOR, so they also can immediately retry all |
| outstanding commands.</span></li> |
| |
| <li><span>In case of the device reset the affected initiators will be notified via |
| the corresponding Unit Attention about reset of |
| all SCSI settings to the initial state. Then the initiators can do necessary |
| recovery actions. Usually no recovery actions are needed, except for the |
| reservation holder, whose reservation was cleared. For it recovery might |
| be not trivial. But Persistent Reservations solve this issue, because |
| they are not cleared by the device reset.</span></li> |
| </ul> |
| |
| <p>Thus, with Persistent Reservations or using CLEAR TASK SET function |
| additional failover recovery time, which MPIO has comparing to MC/S, |
| is time to wait for reset or commands abort finished and time to |
| retry all the aborted commands. On a properly configured system it |
| should be less than few seconds, which is well acceptable on practice. |
| If Linux storage stack improved to allow to abort all submitted to it |
| commands (currently only wait for their completion is possible), then |
| time to abort all the commands can be decreased to a fraction of second. </p> |
| |
| <h2>Performance</h2> |
| |
| <p>At first, neither MC/S, nor MPIO can improve performance if there is |
| only one SCSI command sent to target at time. For instance, in case of |
| tape backup and restore. Both MC/S and MPIO work on the commands level, |
| so can't split data transfers for a single command over several links. |
| Only bonding (also known as NIC teaming or Link Aggregation) can improve |
| performance in this case, because it works on the link level.</p> |
| |
| <p>MC/S over several links preserves commands execution order, i.e. with |
| it commands executed in the same order as they were submitted. MPIO |
| can't preserve this order, because it can't see, which command on which |
| link was submitted earlier. Delays in links processing can change |
| commands order in the place where target receives them.</p> |
| |
| <p>Since initiators usually send commands in the optimal for performance |
| order, reordering can somehow hurt performance. But this can happen only with |
| naive target implementation, which can't recover the optimal commands execution |
| order. Currently Linux is not naive and quite good on this area. See, for |
| instance, section "SEQUENTIAL ACCESS OVER MPIO" in <a |
| href="vl_res.txt">those measurements</a>. Don't look at the absolute |
| numbers, look at %% of performance improvement using the second link. |
| The result equivalent to 200 MB/s over 2 1Gbps links, which is close to |
| possible maximum.</p> |
| |
| <p>If free commands reorder is forbidden for a device, either |
| by use of ORDERED tag, or if the Queue Algorithm Modifier in the Control |
| Mode Page is set to 0, then MPIO will have to maintain commands order by |
| sending commands over only a single link. But on practice this case is |
| really rare and 99.(9)% of OS'es and applications allow free commands |
| reorder and it is enabled by default.</p> |
| |
| <p>From other side, strictly preserving commands order as MC/S does has a |
| downside as well. It can lead to so called "commands ordering |
| bottleneck", when newer commands have to wait before one or more older |
| commands get executed, although it would be better for performance to |
| reorder them. As result, MPIO sometimes has better performance, than |
| MC/S, especially in setups, where maximum IOPS number is important. See, |
| for instance, |
| <a href="http://article.gmane.org/gmane.linux.scsi/16311">here</a>. |
| </p> |
| |
| <h2>When MC/S is better than MPIO</h2> |
| |
| <p>For sake of completeness, we should mention that there are marginal cases, where MPIO can't be used or will not |
| provide any benefit, but MC/S can be successful:</p> |
| |
| <ol> |
| <li><span>When strict commands order is required.</span></li> |
| |
| <li><span>When aborted commands can't be retried.</span></li> |
| |
| </ol> |
| |
| <p>For disks both of them are always false. However for some tape drives |
| and backup applications one or both can be true. But on practice:</p> |
| |
| <ul> |
| |
| <li><span>There are neither known tape drives, nor backup |
| applications, which can use multiple outstanding commands at |
| time. All them support and use only one single outstanding |
| command at time. MC/S can't increase performance for them, only |
| bonding can. So, in this case there no difference between MC/S |
| and MPIO.</span></li> |
| |
| <li><span>The lack of ability to retry commands is rather a |
| limitation of legacy tape drives, which support only implicit |
| address commands, not of MPIO. Modern tape drives and backup |
| applications can use explicit address commands, which you can |
| abort and then retry, hence they are compatible with MPIO.</span></li> |
| |
| </ul> |
| |
| <h2>Conclusion</h2> |
| |
| <p>Thus:</p> |
| |
| <ol> |
| <li><span>Cost to develop MC/S is high, but benefits of it are marginal and with future MPIO |
| improvements can be fully eliminated.</span></li> |
| |
| <li><span>MPIO allows to utilize existing infrastructure for all |
| transports, not only iSCSI. |
| </span></li> |
| |
| <li><span>All transports can benefit from improvements in MPIO.</span></li> |
| |
| <li><span>With MPIO there is no need to create multiple layers doing very similar |
| functionality.</span></li> |
| |
| <li><span> MPIO doesn't have commands ordering bottleneck, which MC/S has. </span></li> |
| |
| </ol> |
| |
| <p>Simply, MC/S is rather a workaround done on the wrong level for some deficiencies of existing SCSI standards used for MPIO, |
| namely the lack of possibility to group several I_T Nexuses with ability to reassign commands |
| between them and preserve commands order among them. If in future those features added in the SCSI standards, MC/S will |
| not be needed at all, hence, all investments in it will be voided. No surprise then that no |
| Open Source OS'es neither support, nor going to implement it. Moreover, |
| when back to 2005 there was an attempt to add MC/S capable iSCSI initiator in Linux, it was |
| rejected. See for more details <a href="http://article.gmane.org/gmane.linux.scsi/15769">here</a> |
| and <a href="http://article.gmane.org/gmane.linux.scsi/16301">here</a>. |
| </p> |
| |
| </div> |
| </div> |
| </div> |
| <!-- wrap ends here --> |
| <!-- footer starts here --> |
| <div id="footer"> |
| <p>© Copyright 2004 - 2020 <b><font color="#EC981F">Vladislav Bolkhovitin & others</font></b> |
| Design by: <b><font color="#EC981F">Daniel Fernandes</font></b> </p> |
| </div> |
| <!-- footer ends here --> |
| <!-- Piwik --> |
| <script type="text/javascript"> |
| var pkBaseURL = (("https:" == document.location.protocol) ? "https://apps.sourceforge.net/piwik/scst/" : "http://apps.sourceforge.net/piwik/scst/"); |
| document.write(unescape("%3Cscript src='" + pkBaseURL + "piwik.js' type='text/javascript'%3E%3C/script%3E")); |
| </script><script type="text/javascript"> |
| piwik_action_name = ''; |
| piwik_idsite = 1; |
| piwik_url = pkBaseURL + "piwik.php"; |
| piwik_log(piwik_action_name, piwik_idsite, piwik_url); |
| </script> |
| <object><noscript><p><img src="http://apps.sourceforge.net/piwik/scst/piwik.php?idsite=1" alt="piwik"></p></noscript></object> |
| <!-- End Piwik Tag --> |
| </body> |
| </html> |