| .\" Licensed under the OpenIB.org BSD license (FreeBSD Variant) - See COPYING.md |
| .TH "RSOCKET" 7 "2019-04-16" "librdmacm" "Librdmacm Programmer's Manual" librdmacm |
| .SH NAME |
| rsocket \- RDMA socket API |
| .SH SYNOPSIS |
| .B "#include <rdma/rsocket.h>" |
| .SH "DESCRIPTION" |
| RDMA socket API and protocol |
| .SH "NOTES" |
| Rsockets is a protocol over RDMA that supports a socket-level API |
| for applications. Rsocket APIs are intended to match the behavior |
| of corresponding socket calls, except where noted. Rsocket |
| functions match the name and function signature of socket calls, |
| with the exception that all function calls are prefixed with an 'r'. |
| .P |
| The following functions are defined: |
| .P |
| rsocket |
| .P |
| rbind, rlisten, raccept, rconnect |
| .P |
| rshutdown, rclose |
| .P |
| rrecv, rrecvfrom, rrecvmsg, rread, rreadv |
| .P |
| rsend, rsendto, rsendmsg, rwrite, rwritev |
| .P |
| rpoll, rselect |
| .P |
| rgetpeername, rgetsockname |
| .P |
| rsetsockopt, rgetsockopt, rfcntl |
| .P |
| Functions take the same parameters as that used for sockets. The |
| follow capabilities and flags are supported at this time: |
| .P |
| PF_INET, PF_INET6, SOCK_STREAM, SOCK_DGRAM |
| .P |
| SOL_SOCKET - SO_ERROR, SO_KEEPALIVE (flag supported, but ignored), |
| SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_REUSEADDR, SO_SNDBUF |
| .P |
| IPPROTO_TCP - TCP_NODELAY, TCP_MAXSEG |
| .P |
| IPPROTO_IPV6 - IPV6_V6ONLY |
| .P |
| MSG_DONTWAIT, MSG_PEEK, O_NONBLOCK |
| .P |
| Rsockets provides extensions beyond normal socket routines that |
| allow for direct placement of data into an application's buffer. |
| This is also known as zero-copy support, since data is sent and |
| received directly, bypassing copies into network controlled buffers. |
| The following calls and options support direct data placement. |
| .P |
| riomap, riounmap, riowrite |
| .TP |
| off_t riomap(int socket, void *buf, size_t len, int prot, int flags, off_t offset) |
| .TP |
| Riomap registers an application buffer with the RDMA hardware |
| associated with an rsocket. The buffer is registered either for |
| local only access (PROT_NONE) or for remote write access (PROT_WRITE). |
| When registered for remote access, the buffer is mapped to a given |
| offset. The offset is either provided by the user, or if the user |
| selects -1 for the offset, rsockets selects one. The remote peer may |
| access an iomapped buffer directly by specifying the correct offset. |
| The mapping is not guaranteed to be available until after the remote |
| peer receives a data transfer initiated after riomap has completed. |
| .PP |
| In order to enable the use of remote IO mapping calls on an rsocket, |
| an application must set the number of IO mappings that are available |
| to the remote peer. This may be done using the rsetsockopt |
| RDMA_IOMAPSIZE option. By default, an rsocket does not support |
| remote IO mappings. |
| riounmap |
| .TP |
| int riounmap(int socket, void *buf, size_t len) |
| .TP |
| Riounmap removes the mapping between a buffer and an rsocket. |
| .P |
| riowrite |
| .TP |
| size_t riowrite(int socket, const void *buf, size_t count, off_t offset, int flags) |
| .TP |
| Riowrite allows an application to transfer data over an rsocket |
| directly into a remotely iomapped buffer. The remote buffer is specified |
| through an offset parameter, which corresponds to a remote iomapped buffer. |
| From the sender's perspective, riowrite behaves similar to rwrite. From |
| a receiver's view, riowrite transfers are silently redirected into a pre- |
| determined data buffer. Data is received automatically, and the receiver |
| is not informed of the transfer. However, iowrite data is still considered |
| part of the data stream, such that iowrite data will be written before a |
| subsequent transfer is received. A message sent immediately after initiating |
| an iowrite may be used to notify the receiver of the iowrite. |
| .P |
| In addition to standard socket options, rsockets supports options |
| specific to RDMA devices and protocols. These options are accessible |
| through rsetsockopt using SOL_RDMA option level. |
| .TP |
| RDMA_SQSIZE - Integer size of the underlying send queue. |
| .TP |
| RDMA_RQSIZE - Integer size of the underlying receive queue. |
| .TP |
| RDMA_INLINE - Integer size of inline data. |
| .TP |
| RDMA_IOMAPSIZE - Integer number of remote IO mappings supported |
| .TP |
| RDMA_ROUTE - struct ibv_path_data of path record for connection. |
| .P |
| Note that rsockets fd's cannot be passed into non-rsocket calls. For |
| applications which must mix rsocket fd's with standard socket fd's or |
| opened files, rpoll and rselect support polling both rsockets and |
| normal fd's. |
| .P |
| Existing applications can make use of rsockets through the use of a |
| preload library. Because rsockets implements an end-to-end protocol, |
| both sides of a connection must use rsockets. The rdma_cm library |
| provides such a preload library, librspreload. To reduce the chance |
| of the preload library intercepting calls without the user's explicit |
| knowledge, the librspreload library is installed into %libdir%/rsocket |
| subdirectory. |
| .P |
| The preload library can be used by setting LD_PRELOAD when running. |
| Note that not all applications will work with rsockets. Support is |
| limited based on the socket options used by the application. |
| Support for fork() is limited, but available. To use rsockets with |
| the preload library for applications that call fork, users must |
| set the environment variable RDMAV_FORK_SAFE=1 on both the client |
| and server side of the connection. In general, fork is |
| supportable for server applications that accept a connection, then |
| fork off a process to handle the new connection. |
| .P |
| rsockets uses configuration files that give an administrator control |
| over the default settings used by rsockets. Use files under |
| @CMAKE_INSTALL_FULL_SYSCONFDIR@/rdma/rsocket as shown: |
| .P |
| .P |
| mem_default - default size of receive buffer(s) |
| .P |
| wmem_default - default size of send buffer(s) |
| .P |
| sqsize_default - default size of send queue |
| .P |
| rqsize_default - default size of receive queue |
| .P |
| inline_default - default size of inline data |
| .P |
| iomap_size - default size of remote iomapping table |
| .P |
| polling_time - default number of microseconds to poll for data before waiting |
| .P |
| wake_up_interval - maximum number of milliseconds to block in poll. |
| This value is used to safe guard against potential application hangs |
| in rpoll(). |
| .P |
| All configuration files should contain a single integer value. Values may |
| be set by issuing a command similar to the following example. |
| .P |
| echo 1000000 > @CMAKE_INSTALL_FULL_SYSCONFDIR@/rdma/rsocket/mem_default |
| .P |
| If configuration files are not available, rsockets uses internal defaults. |
| Applications can override default values programmatically through the |
| rsetsockopt routine. |
| .SH "SEE ALSO" |
| rdma_cm(7) |