| @node Low-Level I/O, File System Interface, I/O on Streams, Top |
| @c %MENU% Low-level, less portable I/O |
| @chapter Low-Level Input/Output |
| |
| This chapter describes functions for performing low-level input/output |
| operations on file descriptors. These functions include the primitives |
| for the higher-level I/O functions described in @ref{I/O on Streams}, as |
| well as functions for performing low-level control operations for which |
| there are no equivalents on streams. |
| |
| Stream-level I/O is more flexible and usually more convenient; |
| therefore, programmers generally use the descriptor-level functions only |
| when necessary. These are some of the usual reasons: |
| |
| @itemize @bullet |
| @item |
| For reading binary files in large chunks. |
| |
| @item |
| For reading an entire file into core before parsing it. |
| |
| @item |
| To perform operations other than data transfer, which can only be done |
| with a descriptor. (You can use @code{fileno} to get the descriptor |
| corresponding to a stream.) |
| |
| @item |
| To pass descriptors to a child process. (The child can create its own |
| stream to use a descriptor that it inherits, but cannot inherit a stream |
| directly.) |
| @end itemize |
| |
| @menu |
| * Opening and Closing Files:: How to open and close file |
| descriptors. |
| * I/O Primitives:: Reading and writing data. |
| * File Position Primitive:: Setting a descriptor's file |
| position. |
| * Descriptors and Streams:: Converting descriptor to stream |
| or vice-versa. |
| * Stream/Descriptor Precautions:: Precautions needed if you use both |
| descriptors and streams. |
| * Scatter-Gather:: Fast I/O to discontinuous buffers. |
| * Memory-mapped I/O:: Using files like memory. |
| * Waiting for I/O:: How to check for input or output |
| on multiple file descriptors. |
| * Synchronizing I/O:: Making sure all I/O actions completed. |
| * Asynchronous I/O:: Perform I/O in parallel. |
| * Control Operations:: Various other operations on file |
| descriptors. |
| * Duplicating Descriptors:: Fcntl commands for duplicating |
| file descriptors. |
| * Descriptor Flags:: Fcntl commands for manipulating |
| flags associated with file |
| descriptors. |
| * File Status Flags:: Fcntl commands for manipulating |
| flags associated with open files. |
| * File Locks:: Fcntl commands for implementing |
| file locking. |
| * Interrupt Input:: Getting an asynchronous signal when |
| input arrives. |
| * IOCTLs:: Generic I/O Control operations. |
| @end menu |
| |
| |
| @node Opening and Closing Files |
| @section Opening and Closing Files |
| |
| @cindex opening a file descriptor |
| @cindex closing a file descriptor |
| This section describes the primitives for opening and closing files |
| using file descriptors. The @code{open} and @code{creat} functions are |
| declared in the header file @file{fcntl.h}, while @code{close} is |
| declared in @file{unistd.h}. |
| @pindex unistd.h |
| @pindex fcntl.h |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
| The @code{open} function creates and returns a new file descriptor for |
| the file named by @var{filename}. Initially, the file position |
| indicator for the file is at the beginning of the file. The argument |
| @var{mode} (@pxref{Permission Bits}) is used only when a file is |
| created, but it doesn't hurt to supply the argument in any case. |
| |
| The @var{flags} argument controls how the file is to be opened. This is |
| a bit mask; you create the value by the bitwise OR of the appropriate |
| parameters (using the @samp{|} operator in C). |
| @xref{File Status Flags}, for the parameters available. |
| |
| The normal return value from @code{open} is a non-negative integer file |
| descriptor. In the case of an error, a value of @math{-1} is returned |
| instead. In addition to the usual file name errors (@pxref{File |
| Name Errors}), the following @code{errno} error conditions are defined |
| for this function: |
| |
| @table @code |
| @item EACCES |
| The file exists but is not readable/writable as requested by the @var{flags} |
| argument, the file does not exist and the directory is unwritable so |
| it cannot be created. |
| |
| @item EEXIST |
| Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already |
| exists. |
| |
| @item EINTR |
| The @code{open} operation was interrupted by a signal. |
| @xref{Interrupted Primitives}. |
| |
| @item EISDIR |
| The @var{flags} argument specified write access, and the file is a directory. |
| |
| @item EMFILE |
| The process has too many files open. |
| The maximum number of file descriptors is controlled by the |
| @code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. |
| |
| @item ENFILE |
| The entire system, or perhaps the file system which contains the |
| directory, cannot support any additional open files at the moment. |
| (This problem cannot happen on @gnuhurdsystems{}.) |
| |
| @item ENOENT |
| The named file does not exist, and @code{O_CREAT} is not specified. |
| |
| @item ENOSPC |
| The directory or file system that would contain the new file cannot be |
| extended, because there is no disk space left. |
| |
| @item ENXIO |
| @code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags} |
| argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and |
| FIFOs}), and no process has the file open for reading. |
| |
| @item EROFS |
| The file resides on a read-only file system and any of @w{@code{O_WRONLY}}, |
| @code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument, |
| or @code{O_CREAT} is set and the file does not already exist. |
| @end table |
| |
| @c !!! umask |
| |
| If on a 32 bit machine the sources are translated with |
| @code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file |
| descriptor opened in the large file mode which enables the file handling |
| functions to use files up to @math{2^63} bytes in size and offset from |
| @math{-2^63} to @math{2^63}. This happens transparently for the user |
| since all of the lowlevel file handling functions are equally replaced. |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{open} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this calls to @code{open} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The @code{open} function is the underlying primitive for the @code{fopen} |
| and @code{freopen} functions, that create streams. |
| @end deftypefun |
| |
| @comment fcntl.h |
| @comment Unix98 |
| @deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
| This function is similar to @code{open}. It returns a file descriptor |
| which can be used to access the file named by @var{filename}. The only |
| difference is that on 32 bit systems the file is opened in the |
| large file mode. I.e., file length and file offsets can exceed 31 bits. |
| |
| When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this |
| function is actually available under the name @code{open}. I.e., the |
| new, extended API using 64 bit file sizes and offsets transparently |
| replaces the old API. |
| @end deftypefun |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
| This function is obsolete. The call: |
| |
| @smallexample |
| creat (@var{filename}, @var{mode}) |
| @end smallexample |
| |
| @noindent |
| is equivalent to: |
| |
| @smallexample |
| open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode}) |
| @end smallexample |
| |
| If on a 32 bit machine the sources are translated with |
| @code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file |
| descriptor opened in the large file mode which enables the file handling |
| functions to use files up to @math{2^63} in size and offset from |
| @math{-2^63} to @math{2^63}. This happens transparently for the user |
| since all of the lowlevel file handling functions are equally replaced. |
| @end deftypefn |
| |
| @comment fcntl.h |
| @comment Unix98 |
| @deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
| This function is similar to @code{creat}. It returns a file descriptor |
| which can be used to access the file named by @var{filename}. The only |
| the difference is that on 32 bit systems the file is opened in the |
| large file mode. I.e., file length and file offsets can exceed 31 bits. |
| |
| To use this file descriptor one must not use the normal operations but |
| instead the counterparts named @code{*64}, e.g., @code{read64}. |
| |
| When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this |
| function is actually available under the name @code{open}. I.e., the |
| new, extended API using 64 bit file sizes and offsets transparently |
| replaces the old API. |
| @end deftypefn |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun int close (int @var{filedes}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{@acsfd{}}} |
| The function @code{close} closes the file descriptor @var{filedes}. |
| Closing a file has the following consequences: |
| |
| @itemize @bullet |
| @item |
| The file descriptor is deallocated. |
| |
| @item |
| Any record locks owned by the process on the file are unlocked. |
| |
| @item |
| When all file descriptors associated with a pipe or FIFO have been closed, |
| any unread data is discarded. |
| @end itemize |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{close} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this, calls to @code{close} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The normal return value from @code{close} is @math{0}; a value of @math{-1} |
| is returned in case of failure. The following @code{errno} error |
| conditions are defined for this function: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is not a valid file descriptor. |
| |
| @item EINTR |
| The @code{close} call was interrupted by a signal. |
| @xref{Interrupted Primitives}. |
| Here is an example of how to handle @code{EINTR} properly: |
| |
| @smallexample |
| TEMP_FAILURE_RETRY (close (desc)); |
| @end smallexample |
| |
| @item ENOSPC |
| @itemx EIO |
| @itemx EDQUOT |
| When the file is accessed by NFS, these errors from @code{write} can sometimes |
| not be detected until @code{close}. @xref{I/O Primitives}, for details |
| on their meaning. |
| @end table |
| |
| Please note that there is @emph{no} separate @code{close64} function. |
| This is not necessary since this function does not determine nor depend |
| on the mode of the file. The kernel which performs the @code{close} |
| operation knows which mode the descriptor is used for and can handle |
| this situation. |
| @end deftypefun |
| |
| To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead |
| of trying to close its underlying file descriptor with @code{close}. |
| This flushes any buffered output and updates the stream object to |
| indicate that it is closed. |
| |
| @node I/O Primitives |
| @section Input and Output Primitives |
| |
| This section describes the functions for performing primitive input and |
| output operations on file descriptors: @code{read}, @code{write}, and |
| @code{lseek}. These functions are declared in the header file |
| @file{unistd.h}. |
| @pindex unistd.h |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftp {Data Type} ssize_t |
| This data type is used to represent the sizes of blocks that can be |
| read or written in a single operation. It is similar to @code{size_t}, |
| but must be a signed type. |
| @end deftp |
| |
| @cindex reading from a file descriptor |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{read} function reads up to @var{size} bytes from the file |
| with descriptor @var{filedes}, storing the results in the @var{buffer}. |
| (This is not necessarily a character string, and no terminating null |
| character is added.) |
| |
| @cindex end-of-file, on a file descriptor |
| The return value is the number of bytes actually read. This might be |
| less than @var{size}; for example, if there aren't that many bytes left |
| in the file or if there aren't that many bytes immediately available. |
| The exact behavior depends on what kind of file it is. Note that |
| reading less than @var{size} bytes is not an error. |
| |
| A value of zero indicates end-of-file (except if the value of the |
| @var{size} argument is also zero). This is not considered an error. |
| If you keep calling @code{read} while at end-of-file, it will keep |
| returning zero and doing nothing else. |
| |
| If @code{read} returns at least one character, there is no way you can |
| tell whether end-of-file was reached. But if you did reach the end, the |
| next read will return zero. |
| |
| In case of an error, @code{read} returns @math{-1}. The following |
| @code{errno} error conditions are defined for this function: |
| |
| @table @code |
| @item EAGAIN |
| Normally, when no input is immediately available, @code{read} waits for |
| some input. But if the @code{O_NONBLOCK} flag is set for the file |
| (@pxref{File Status Flags}), @code{read} returns immediately without |
| reading any data, and reports this error. |
| |
| @strong{Compatibility Note:} Most versions of BSD Unix use a different |
| error code for this: @code{EWOULDBLOCK}. In @theglibc{}, |
| @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter |
| which name you use. |
| |
| On some systems, reading a large amount of data from a character special |
| file can also fail with @code{EAGAIN} if the kernel cannot find enough |
| physical memory to lock down the user's pages. This is limited to |
| devices that transfer with direct memory access into the user's memory, |
| which means it does not include terminals, since they always use |
| separate buffers inside the kernel. This problem never happens on |
| @gnuhurdsystems{}. |
| |
| Any condition that could result in @code{EAGAIN} can instead result in a |
| successful @code{read} which returns fewer bytes than requested. |
| Calling @code{read} again immediately would result in @code{EAGAIN}. |
| |
| @item EBADF |
| The @var{filedes} argument is not a valid file descriptor, |
| or is not open for reading. |
| |
| @item EINTR |
| @code{read} was interrupted by a signal while it was waiting for input. |
| @xref{Interrupted Primitives}. A signal will not necessary cause |
| @code{read} to return @code{EINTR}; it may instead result in a |
| successful @code{read} which returns fewer bytes than requested. |
| |
| @item EIO |
| For many devices, and for disk files, this error code indicates |
| a hardware error. |
| |
| @code{EIO} also occurs when a background process tries to read from the |
| controlling terminal, and the normal action of stopping the process by |
| sending it a @code{SIGTTIN} signal isn't working. This might happen if |
| the signal is being blocked or ignored, or because the process group is |
| orphaned. @xref{Job Control}, for more information about job control, |
| and @ref{Signal Handling}, for information about signals. |
| |
| @item EINVAL |
| In some systems, when reading from a character or block device, position |
| and size offsets must be aligned to a particular block size. This error |
| indicates that the offsets were not properly aligned. |
| @end table |
| |
| Please note that there is no function named @code{read64}. This is not |
| necessary since this function does not directly modify or handle the |
| possibly wide file offset. Since the kernel handles this state |
| internally, the @code{read} function can be used for all cases. |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{read} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this, calls to @code{read} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The @code{read} function is the underlying primitive for all of the |
| functions that read from streams, such as @code{fgetc}. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment Unix98 |
| @deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| @c This is usually a safe syscall. The sysdeps/posix fallback emulation |
| @c is not MT-Safe because it uses lseek, read and lseek back, but is it |
| @c used anywhere? |
| The @code{pread} function is similar to the @code{read} function. The |
| first three arguments are identical, and the return values and error |
| codes also correspond. |
| |
| The difference is the fourth argument and its handling. The data block |
| is not read from the current position of the file descriptor |
| @code{filedes}. Instead the data is read from the file starting at |
| position @var{offset}. The position of the file descriptor itself is |
| not affected by the operation. The value is the same as before the call. |
| |
| When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
| @code{pread} function is in fact @code{pread64} and the type |
| @code{off_t} has 64 bits, which makes it possible to handle files up to |
| @math{2^63} bytes in length. |
| |
| The return value of @code{pread} describes the number of bytes read. |
| In the error case it returns @math{-1} like @code{read} does and the |
| error codes are also the same, with these additions: |
| |
| @table @code |
| @item EINVAL |
| The value given for @var{offset} is negative and therefore illegal. |
| |
| @item ESPIPE |
| The file descriptor @var{filedes} is associate with a pipe or a FIFO and |
| this device does not allow positioning of the file pointer. |
| @end table |
| |
| The function is an extension defined in the Unix Single Specification |
| version 2. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment Unix98 |
| @deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| @c This is usually a safe syscall. The sysdeps/posix fallback emulation |
| @c is not MT-Safe because it uses lseek64, read and lseek64 back, but is |
| @c it used anywhere? |
| This function is similar to the @code{pread} function. The difference |
| is that the @var{offset} parameter is of type @code{off64_t} instead of |
| @code{off_t} which makes it possible on 32 bit machines to address |
| files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
| file descriptor @code{filedes} must be opened using @code{open64} since |
| otherwise the large offsets possible with @code{off64_t} will lead to |
| errors with a descriptor in small file mode. |
| |
| When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
| 32 bit machine this function is actually available under the name |
| @code{pread} and so transparently replaces the 32 bit interface. |
| @end deftypefun |
| |
| @cindex writing to a file descriptor |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{write} function writes up to @var{size} bytes from |
| @var{buffer} to the file with descriptor @var{filedes}. The data in |
| @var{buffer} is not necessarily a character string and a null character is |
| output like any other character. |
| |
| The return value is the number of bytes actually written. This may be |
| @var{size}, but can always be smaller. Your program should always call |
| @code{write} in a loop, iterating until all the data is written. |
| |
| Once @code{write} returns, the data is enqueued to be written and can be |
| read back right away, but it is not necessarily written out to permanent |
| storage immediately. You can use @code{fsync} when you need to be sure |
| your data has been permanently stored before continuing. (It is more |
| efficient for the system to batch up consecutive writes and do them all |
| at once when convenient. Normally they will always be written to disk |
| within a minute or less.) Modern systems provide another function |
| @code{fdatasync} which guarantees integrity only for the file data and |
| is therefore faster. |
| @c !!! xref fsync, fdatasync |
| You can use the @code{O_FSYNC} open mode to make @code{write} always |
| store the data to disk before returning; @pxref{Operating Modes}. |
| |
| In the case of an error, @code{write} returns @math{-1}. The following |
| @code{errno} error conditions are defined for this function: |
| |
| @table @code |
| @item EAGAIN |
| Normally, @code{write} blocks until the write operation is complete. |
| But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control |
| Operations}), it returns immediately without writing any data and |
| reports this error. An example of a situation that might cause the |
| process to block on output is writing to a terminal device that supports |
| flow control, where output has been suspended by receipt of a STOP |
| character. |
| |
| @strong{Compatibility Note:} Most versions of BSD Unix use a different |
| error code for this: @code{EWOULDBLOCK}. In @theglibc{}, |
| @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter |
| which name you use. |
| |
| On some systems, writing a large amount of data from a character special |
| file can also fail with @code{EAGAIN} if the kernel cannot find enough |
| physical memory to lock down the user's pages. This is limited to |
| devices that transfer with direct memory access into the user's memory, |
| which means it does not include terminals, since they always use |
| separate buffers inside the kernel. This problem does not arise on |
| @gnuhurdsystems{}. |
| |
| @item EBADF |
| The @var{filedes} argument is not a valid file descriptor, |
| or is not open for writing. |
| |
| @item EFBIG |
| The size of the file would become larger than the implementation can support. |
| |
| @item EINTR |
| The @code{write} operation was interrupted by a signal while it was |
| blocked waiting for completion. A signal will not necessarily cause |
| @code{write} to return @code{EINTR}; it may instead result in a |
| successful @code{write} which writes fewer bytes than requested. |
| @xref{Interrupted Primitives}. |
| |
| @item EIO |
| For many devices, and for disk files, this error code indicates |
| a hardware error. |
| |
| @item ENOSPC |
| The device containing the file is full. |
| |
| @item EPIPE |
| This error is returned when you try to write to a pipe or FIFO that |
| isn't open for reading by any process. When this happens, a @code{SIGPIPE} |
| signal is also sent to the process; see @ref{Signal Handling}. |
| |
| @item EINVAL |
| In some systems, when writing to a character or block device, position |
| and size offsets must be aligned to a particular block size. This error |
| indicates that the offsets were not properly aligned. |
| @end table |
| |
| Unless you have arranged to prevent @code{EINTR} failures, you should |
| check @code{errno} after each failing call to @code{write}, and if the |
| error was @code{EINTR}, you should simply repeat the call. |
| @xref{Interrupted Primitives}. The easy way to do this is with the |
| macro @code{TEMP_FAILURE_RETRY}, as follows: |
| |
| @smallexample |
| nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count)); |
| @end smallexample |
| |
| Please note that there is no function named @code{write64}. This is not |
| necessary since this function does not directly modify or handle the |
| possibly wide file offset. Since the kernel handles this state |
| internally the @code{write} function can be used for all cases. |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{write} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this, calls to @code{write} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The @code{write} function is the underlying primitive for all of the |
| functions that write to streams, such as @code{fputc}. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment Unix98 |
| @deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| @c This is usually a safe syscall. The sysdeps/posix fallback emulation |
| @c is not MT-Safe because it uses lseek, write and lseek back, but is it |
| @c used anywhere? |
| The @code{pwrite} function is similar to the @code{write} function. The |
| first three arguments are identical, and the return values and error codes |
| also correspond. |
| |
| The difference is the fourth argument and its handling. The data block |
| is not written to the current position of the file descriptor |
| @code{filedes}. Instead the data is written to the file starting at |
| position @var{offset}. The position of the file descriptor itself is |
| not affected by the operation. The value is the same as before the call. |
| |
| When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
| @code{pwrite} function is in fact @code{pwrite64} and the type |
| @code{off_t} has 64 bits, which makes it possible to handle files up to |
| @math{2^63} bytes in length. |
| |
| The return value of @code{pwrite} describes the number of written bytes. |
| In the error case it returns @math{-1} like @code{write} does and the |
| error codes are also the same, with these additions: |
| |
| @table @code |
| @item EINVAL |
| The value given for @var{offset} is negative and therefore illegal. |
| |
| @item ESPIPE |
| The file descriptor @var{filedes} is associated with a pipe or a FIFO and |
| this device does not allow positioning of the file pointer. |
| @end table |
| |
| The function is an extension defined in the Unix Single Specification |
| version 2. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment Unix98 |
| @deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| @c This is usually a safe syscall. The sysdeps/posix fallback emulation |
| @c is not MT-Safe because it uses lseek64, write and lseek64 back, but |
| @c is it used anywhere? |
| This function is similar to the @code{pwrite} function. The difference |
| is that the @var{offset} parameter is of type @code{off64_t} instead of |
| @code{off_t} which makes it possible on 32 bit machines to address |
| files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
| file descriptor @code{filedes} must be opened using @code{open64} since |
| otherwise the large offsets possible with @code{off64_t} will lead to |
| errors with a descriptor in small file mode. |
| |
| When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a |
| 32 bit machine this function is actually available under the name |
| @code{pwrite} and so transparently replaces the 32 bit interface. |
| @end deftypefun |
| |
| |
| @node File Position Primitive |
| @section Setting the File Position of a Descriptor |
| |
| Just as you can set the file position of a stream with @code{fseek}, you |
| can set the file position of a descriptor with @code{lseek}. This |
| specifies the position in the file for the next @code{read} or |
| @code{write} operation. @xref{File Positioning}, for more information |
| on the file position and what it means. |
| |
| To read the current file position value from a descriptor, use |
| @code{lseek (@var{desc}, 0, SEEK_CUR)}. |
| |
| @cindex file positioning on a file descriptor |
| @cindex positioning a file descriptor |
| @cindex seeking on a file descriptor |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{lseek} function is used to change the file position of the |
| file with descriptor @var{filedes}. |
| |
| The @var{whence} argument specifies how the @var{offset} should be |
| interpreted, in the same way as for the @code{fseek} function, and it must |
| be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or |
| @code{SEEK_END}. |
| |
| @table @code |
| @item SEEK_SET |
| Specifies that @var{offset} is a count of characters from the beginning |
| of the file. |
| |
| @item SEEK_CUR |
| Specifies that @var{offset} is a count of characters from the current |
| file position. This count may be positive or negative. |
| |
| @item SEEK_END |
| Specifies that @var{offset} is a count of characters from the end of |
| the file. A negative count specifies a position within the current |
| extent of the file; a positive count specifies a position past the |
| current end. If you set the position past the current end, and |
| actually write data, you will extend the file with zeros up to that |
| position. |
| @end table |
| |
| The return value from @code{lseek} is normally the resulting file |
| position, measured in bytes from the beginning of the file. |
| You can use this feature together with @code{SEEK_CUR} to read the |
| current file position. |
| |
| If you want to append to the file, setting the file position to the |
| current end of file with @code{SEEK_END} is not sufficient. Another |
| process may write more data after you seek but before you write, |
| extending the file so the position you write onto clobbers their data. |
| Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}. |
| |
| You can set the file position past the current end of the file. This |
| does not by itself make the file longer; @code{lseek} never changes the |
| file. But subsequent output at that position will extend the file. |
| Characters between the previous end of file and the new position are |
| filled with zeros. Extending the file in this way can create a |
| ``hole'': the blocks of zeros are not actually allocated on disk, so the |
| file takes up less space than it appears to; it is then called a |
| ``sparse file''. |
| @cindex sparse files |
| @cindex holes in files |
| |
| If the file position cannot be changed, or the operation is in some way |
| invalid, @code{lseek} returns a value of @math{-1}. The following |
| @code{errno} error conditions are defined for this function: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} is not a valid file descriptor. |
| |
| @item EINVAL |
| The @var{whence} argument value is not valid, or the resulting |
| file offset is not valid. A file offset is invalid. |
| |
| @item ESPIPE |
| The @var{filedes} corresponds to an object that cannot be positioned, |
| such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error |
| only for pipes and FIFOs, but on @gnusystems{}, you always get |
| @code{ESPIPE} if the object is not seekable.) |
| @end table |
| |
| When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
| @code{lseek} function is in fact @code{lseek64} and the type |
| @code{off_t} has 64 bits which makes it possible to handle files up to |
| @math{2^63} bytes in length. |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{lseek} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this calls to @code{lseek} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The @code{lseek} function is the underlying primitive for the |
| @code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and |
| @code{rewind} functions, which operate on streams instead of file |
| descriptors. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment Unix98 |
| @deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function is similar to the @code{lseek} function. The difference |
| is that the @var{offset} parameter is of type @code{off64_t} instead of |
| @code{off_t} which makes it possible on 32 bit machines to address |
| files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
| file descriptor @code{filedes} must be opened using @code{open64} since |
| otherwise the large offsets possible with @code{off64_t} will lead to |
| errors with a descriptor in small file mode. |
| |
| When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
| 32 bits machine this function is actually available under the name |
| @code{lseek} and so transparently replaces the 32 bit interface. |
| @end deftypefun |
| |
| You can have multiple descriptors for the same file if you open the file |
| more than once, or if you duplicate a descriptor with @code{dup}. |
| Descriptors that come from separate calls to @code{open} have independent |
| file positions; using @code{lseek} on one descriptor has no effect on the |
| other. For example, |
| |
| @smallexample |
| @group |
| @{ |
| int d1, d2; |
| char buf[4]; |
| d1 = open ("foo", O_RDONLY); |
| d2 = open ("foo", O_RDONLY); |
| lseek (d1, 1024, SEEK_SET); |
| read (d2, buf, 4); |
| @} |
| @end group |
| @end smallexample |
| |
| @noindent |
| will read the first four characters of the file @file{foo}. (The |
| error-checking code necessary for a real program has been omitted here |
| for brevity.) |
| |
| By contrast, descriptors made by duplication share a common file |
| position with the original descriptor that was duplicated. Anything |
| which alters the file position of one of the duplicates, including |
| reading or writing data, affects all of them alike. Thus, for example, |
| |
| @smallexample |
| @{ |
| int d1, d2, d3; |
| char buf1[4], buf2[4]; |
| d1 = open ("foo", O_RDONLY); |
| d2 = dup (d1); |
| d3 = dup (d2); |
| lseek (d3, 1024, SEEK_SET); |
| read (d1, buf1, 4); |
| read (d2, buf2, 4); |
| @} |
| @end smallexample |
| |
| @noindent |
| will read four characters starting with the 1024'th character of |
| @file{foo}, and then four more characters starting with the 1028'th |
| character. |
| |
| @comment sys/types.h |
| @comment POSIX.1 |
| @deftp {Data Type} off_t |
| This is a signed integer type used to represent file sizes. In |
| @theglibc{}, this type is no narrower than @code{int}. |
| |
| If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type |
| is transparently replaced by @code{off64_t}. |
| @end deftp |
| |
| @comment sys/types.h |
| @comment Unix98 |
| @deftp {Data Type} off64_t |
| This type is used similar to @code{off_t}. The difference is that even |
| on 32 bit machines, where the @code{off_t} type would have 32 bits, |
| @code{off64_t} has 64 bits and so is able to address files up to |
| @math{2^63} bytes in length. |
| |
| When compiling with @code{_FILE_OFFSET_BITS == 64} this type is |
| available under the name @code{off_t}. |
| @end deftp |
| |
| These aliases for the @samp{SEEK_@dots{}} constants exist for the sake |
| of compatibility with older BSD systems. They are defined in two |
| different header files: @file{fcntl.h} and @file{sys/file.h}. |
| |
| @table @code |
| @item L_SET |
| An alias for @code{SEEK_SET}. |
| |
| @item L_INCR |
| An alias for @code{SEEK_CUR}. |
| |
| @item L_XTND |
| An alias for @code{SEEK_END}. |
| @end table |
| |
| @node Descriptors and Streams |
| @section Descriptors and Streams |
| @cindex streams, and file descriptors |
| @cindex converting file descriptor to stream |
| @cindex extracting file descriptor from stream |
| |
| Given an open file descriptor, you can create a stream for it with the |
| @code{fdopen} function. You can get the underlying file descriptor for |
| an existing stream with the @code{fileno} function. These functions are |
| declared in the header file @file{stdio.h}. |
| @pindex stdio.h |
| |
| @comment stdio.h |
| @comment POSIX.1 |
| @deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{} @asulock{}}@acunsafe{@acsmem{} @aculock{}}} |
| The @code{fdopen} function returns a new stream for the file descriptor |
| @var{filedes}. |
| |
| The @var{opentype} argument is interpreted in the same way as for the |
| @code{fopen} function (@pxref{Opening Streams}), except that |
| the @samp{b} option is not permitted; this is because @gnusystems{} make no |
| distinction between text and binary files. Also, @code{"w"} and |
| @code{"w+"} do not cause truncation of the file; these have an effect only |
| when opening a file, and in this case the file has already been opened. |
| You must make sure that the @var{opentype} argument matches the actual |
| mode of the open file descriptor. |
| |
| The return value is the new stream. If the stream cannot be created |
| (for example, if the modes for the file indicated by the file descriptor |
| do not permit the access specified by the @var{opentype} argument), a |
| null pointer is returned instead. |
| |
| In some other systems, @code{fdopen} may fail to detect that the modes |
| for file descriptor do not permit the access specified by |
| @code{opentype}. @Theglibc{} always checks for this. |
| @end deftypefun |
| |
| For an example showing the use of the @code{fdopen} function, |
| see @ref{Creating a Pipe}. |
| |
| @comment stdio.h |
| @comment POSIX.1 |
| @deftypefun int fileno (FILE *@var{stream}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function returns the file descriptor associated with the stream |
| @var{stream}. If an error is detected (for example, if the @var{stream} |
| is not valid) or if @var{stream} does not do I/O to a file, |
| @code{fileno} returns @math{-1}. |
| @end deftypefun |
| |
| @comment stdio.h |
| @comment GNU |
| @deftypefun int fileno_unlocked (FILE *@var{stream}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{fileno_unlocked} function is equivalent to the @code{fileno} |
| function except that it does not implicitly lock the stream if the state |
| is @code{FSETLOCKING_INTERNAL}. |
| |
| This function is a GNU extension. |
| @end deftypefun |
| |
| @cindex standard file descriptors |
| @cindex file descriptors, standard |
| There are also symbolic constants defined in @file{unistd.h} for the |
| file descriptors belonging to the standard streams @code{stdin}, |
| @code{stdout}, and @code{stderr}; see @ref{Standard Streams}. |
| @pindex unistd.h |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @table @code |
| @item STDIN_FILENO |
| @vindex STDIN_FILENO |
| This macro has value @code{0}, which is the file descriptor for |
| standard input. |
| @cindex standard input file descriptor |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @item STDOUT_FILENO |
| @vindex STDOUT_FILENO |
| This macro has value @code{1}, which is the file descriptor for |
| standard output. |
| @cindex standard output file descriptor |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @item STDERR_FILENO |
| @vindex STDERR_FILENO |
| This macro has value @code{2}, which is the file descriptor for |
| standard error output. |
| @end table |
| @cindex standard error file descriptor |
| |
| @node Stream/Descriptor Precautions |
| @section Dangers of Mixing Streams and Descriptors |
| @cindex channels |
| @cindex streams and descriptors |
| @cindex descriptors and streams |
| @cindex mixing descriptors and streams |
| |
| You can have multiple file descriptors and streams (let's call both |
| streams and descriptors ``channels'' for short) connected to the same |
| file, but you must take care to avoid confusion between channels. There |
| are two cases to consider: @dfn{linked} channels that share a single |
| file position value, and @dfn{independent} channels that have their own |
| file positions. |
| |
| It's best to use just one channel in your program for actual data |
| transfer to any given file, except when all the access is for input. |
| For example, if you open a pipe (something you can only do at the file |
| descriptor level), either do all I/O with the descriptor, or construct a |
| stream from the descriptor with @code{fdopen} and then do all I/O with |
| the stream. |
| |
| @menu |
| * Linked Channels:: Dealing with channels sharing a file position. |
| * Independent Channels:: Dealing with separately opened, unlinked channels. |
| * Cleaning Streams:: Cleaning a stream makes it safe to use |
| another channel. |
| @end menu |
| |
| @node Linked Channels |
| @subsection Linked Channels |
| @cindex linked channels |
| |
| Channels that come from a single opening share the same file position; |
| we call them @dfn{linked} channels. Linked channels result when you |
| make a stream from a descriptor using @code{fdopen}, when you get a |
| descriptor from a stream with @code{fileno}, when you copy a descriptor |
| with @code{dup} or @code{dup2}, and when descriptors are inherited |
| during @code{fork}. For files that don't support random access, such as |
| terminals and pipes, @emph{all} channels are effectively linked. On |
| random-access files, all append-type output streams are effectively |
| linked to each other. |
| |
| @cindex cleaning up a stream |
| If you have been using a stream for I/O (or have just opened the stream), |
| and you want to do I/O using |
| another channel (either a stream or a descriptor) that is linked to it, |
| you must first @dfn{clean up} the stream that you have been using. |
| @xref{Cleaning Streams}. |
| |
| Terminating a process, or executing a new program in the process, |
| destroys all the streams in the process. If descriptors linked to these |
| streams persist in other processes, their file positions become |
| undefined as a result. To prevent this, you must clean up the streams |
| before destroying them. |
| |
| @node Independent Channels |
| @subsection Independent Channels |
| @cindex independent channels |
| |
| When you open channels (streams or descriptors) separately on a seekable |
| file, each channel has its own file position. These are called |
| @dfn{independent channels}. |
| |
| The system handles each channel independently. Most of the time, this |
| is quite predictable and natural (especially for input): each channel |
| can read or write sequentially at its own place in the file. However, |
| if some of the channels are streams, you must take these precautions: |
| |
| @itemize @bullet |
| @item |
| You should clean an output stream after use, before doing anything else |
| that might read or write from the same part of the file. |
| |
| @item |
| You should clean an input stream before reading data that may have been |
| modified using an independent channel. Otherwise, you might read |
| obsolete data that had been in the stream's buffer. |
| @end itemize |
| |
| If you do output to one channel at the end of the file, this will |
| certainly leave the other independent channels positioned somewhere |
| before the new end. You cannot reliably set their file positions to the |
| new end of file before writing, because the file can always be extended |
| by another process between when you set the file position and when you |
| write the data. Instead, use an append-type descriptor or stream; they |
| always output at the current end of the file. In order to make the |
| end-of-file position accurate, you must clean the output channel you |
| were using, if it is a stream. |
| |
| It's impossible for two channels to have separate file pointers for a |
| file that doesn't support random access. Thus, channels for reading or |
| writing such files are always linked, never independent. Append-type |
| channels are also always linked. For these channels, follow the rules |
| for linked channels; see @ref{Linked Channels}. |
| |
| @node Cleaning Streams |
| @subsection Cleaning Streams |
| |
| You can use @code{fflush} to clean a stream in most |
| cases. |
| |
| You can skip the @code{fflush} if you know the stream |
| is already clean. A stream is clean whenever its buffer is empty. For |
| example, an unbuffered stream is always clean. An input stream that is |
| at end-of-file is clean. A line-buffered stream is clean when the last |
| character output was a newline. However, a just-opened input stream |
| might not be clean, as its input buffer might not be empty. |
| |
| There is one case in which cleaning a stream is impossible on most |
| systems. This is when the stream is doing input from a file that is not |
| random-access. Such streams typically read ahead, and when the file is |
| not random access, there is no way to give back the excess data already |
| read. When an input stream reads from a random-access file, |
| @code{fflush} does clean the stream, but leaves the file pointer at an |
| unpredictable place; you must set the file pointer before doing any |
| further I/O. |
| |
| Closing an output-only stream also does @code{fflush}, so this is a |
| valid way of cleaning an output stream. |
| |
| You need not clean a stream before using its descriptor for control |
| operations such as setting terminal modes; these operations don't affect |
| the file position and are not affected by it. You can use any |
| descriptor for these operations, and all channels are affected |
| simultaneously. However, text already ``output'' to a stream but still |
| buffered by the stream will be subject to the new terminal modes when |
| subsequently flushed. To make sure ``past'' output is covered by the |
| terminal settings that were in effect at the time, flush the output |
| streams for that terminal before setting the modes. @xref{Terminal |
| Modes}. |
| |
| @node Scatter-Gather |
| @section Fast Scatter-Gather I/O |
| @cindex scatter-gather |
| |
| Some applications may need to read or write data to multiple buffers, |
| which are separated in memory. Although this can be done easily enough |
| with multiple calls to @code{read} and @code{write}, it is inefficient |
| because there is overhead associated with each kernel call. |
| |
| Instead, many platforms provide special high-speed primitives to perform |
| these @dfn{scatter-gather} operations in a single kernel call. @Theglibc{} |
| will provide an emulation on any system that lacks these |
| primitives, so they are not a portability threat. They are defined in |
| @code{sys/uio.h}. |
| |
| These functions are controlled with arrays of @code{iovec} structures, |
| which describe the location and size of each buffer. |
| |
| @comment sys/uio.h |
| @comment BSD |
| @deftp {Data Type} {struct iovec} |
| |
| The @code{iovec} structure describes a buffer. It contains two fields: |
| |
| @table @code |
| |
| @item void *iov_base |
| Contains the address of a buffer. |
| |
| @item size_t iov_len |
| Contains the length of the buffer. |
| |
| @end table |
| @end deftp |
| |
| @comment sys/uio.h |
| @comment BSD |
| @deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| @c The fallback sysdeps/posix implementation, used even on GNU/Linux |
| @c with old kernels that lack a full readv/writev implementation, may |
| @c malloc the buffer into which data is read, if the total read size is |
| @c too large for alloca. |
| |
| The @code{readv} function reads data from @var{filedes} and scatters it |
| into the buffers described in @var{vector}, which is taken to be |
| @var{count} structures long. As each buffer is filled, data is sent to the |
| next. |
| |
| Note that @code{readv} is not guaranteed to fill all the buffers. |
| It may stop at any point, for the same reasons @code{read} would. |
| |
| The return value is a count of bytes (@emph{not} buffers) read, @math{0} |
| indicating end-of-file, or @math{-1} indicating an error. The possible |
| errors are the same as in @code{read}. |
| |
| @end deftypefun |
| |
| @comment sys/uio.h |
| @comment BSD |
| @deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
| @c The fallback sysdeps/posix implementation, used even on GNU/Linux |
| @c with old kernels that lack a full readv/writev implementation, may |
| @c malloc the buffer from which data is written, if the total write size |
| @c is too large for alloca. |
| |
| The @code{writev} function gathers data from the buffers described in |
| @var{vector}, which is taken to be @var{count} structures long, and writes |
| them to @code{filedes}. As each buffer is written, it moves on to the |
| next. |
| |
| Like @code{readv}, @code{writev} may stop midstream under the same |
| conditions @code{write} would. |
| |
| The return value is a count of bytes written, or @math{-1} indicating an |
| error. The possible errors are the same as in @code{write}. |
| |
| @end deftypefun |
| |
| @c Note - I haven't read this anywhere. I surmised it from my knowledge |
| @c of computer science. Thus, there could be subtleties I'm missing. |
| |
| Note that if the buffers are small (under about 1kB), high-level streams |
| may be easier to use than these functions. However, @code{readv} and |
| @code{writev} are more efficient when the individual buffers themselves |
| (as opposed to the total output), are large. In that case, a high-level |
| stream would not be able to cache the data effectively. |
| |
| @node Memory-mapped I/O |
| @section Memory-mapped I/O |
| |
| On modern operating systems, it is possible to @dfn{mmap} (pronounced |
| ``em-map'') a file to a region of memory. When this is done, the file can |
| be accessed just like an array in the program. |
| |
| This is more efficient than @code{read} or @code{write}, as only the regions |
| of the file that a program actually accesses are loaded. Accesses to |
| not-yet-loaded parts of the mmapped region are handled in the same way as |
| swapped out pages. |
| |
| Since mmapped pages can be stored back to their file when physical |
| memory is low, it is possible to mmap files orders of magnitude larger |
| than both the physical memory @emph{and} swap space. The only limit is |
| address space. The theoretical limit is 4GB on a 32-bit machine - |
| however, the actual limit will be smaller since some areas will be |
| reserved for other purposes. If the LFS interface is used the file size |
| on 32-bit systems is not limited to 2GB (offsets are signed which |
| reduces the addressable area of 4GB by half); the full 64-bit are |
| available. |
| |
| Memory mapping only works on entire pages of memory. Thus, addresses |
| for mapping must be page-aligned, and length values will be rounded up. |
| To determine the size of a page the machine uses one should use |
| |
| @vindex _SC_PAGESIZE |
| @smallexample |
| size_t page_size = (size_t) sysconf (_SC_PAGESIZE); |
| @end smallexample |
| |
| @noindent |
| These functions are declared in @file{sys/mman.h}. |
| |
| @comment sys/mman.h |
| @comment POSIX |
| @deftypefun {void *} mmap (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| The @code{mmap} function creates a new mapping, connected to bytes |
| (@var{offset}) to (@var{offset} + @var{length} - 1) in the file open on |
| @var{filedes}. A new reference for the file specified by @var{filedes} |
| is created, which is not removed by closing the file. |
| |
| @var{address} gives a preferred starting address for the mapping. |
| @code{NULL} expresses no preference. Any previous mapping at that |
| address is automatically removed. The address you give may still be |
| changed, unless you use the @code{MAP_FIXED} flag. |
| |
| @vindex PROT_READ |
| @vindex PROT_WRITE |
| @vindex PROT_EXEC |
| @var{protect} contains flags that control what kind of access is |
| permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and |
| @code{PROT_EXEC}, which permit reading, writing, and execution, |
| respectively. Inappropriate access will cause a segfault (@pxref{Program |
| Error Signals}). |
| |
| Note that most hardware designs cannot support write permission without |
| read permission, and many do not distinguish read and execute permission. |
| Thus, you may receive wider permissions than you ask for, and mappings of |
| write-only files may be denied even if you do not use @code{PROT_READ}. |
| |
| @var{flags} contains flags that control the nature of the map. |
| One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified. |
| |
| They include: |
| |
| @vtable @code |
| @item MAP_PRIVATE |
| This specifies that writes to the region should never be written back |
| to the attached file. Instead, a copy is made for the process, and the |
| region will be swapped normally if memory runs low. No other process will |
| see the changes. |
| |
| Since private mappings effectively revert to ordinary memory |
| when written to, you must have enough virtual memory for a copy of |
| the entire mmapped region if you use this mode with @code{PROT_WRITE}. |
| |
| @item MAP_SHARED |
| This specifies that writes to the region will be written back to the |
| file. Changes made will be shared immediately with other processes |
| mmaping the same file. |
| |
| Note that actual writing may take place at any time. You need to use |
| @code{msync}, described below, if it is important that other processes |
| using conventional I/O get a consistent view of the file. |
| |
| @item MAP_FIXED |
| This forces the system to use the exact mapping address specified in |
| @var{address} and fail if it can't. |
| |
| @c One of these is official - the other is obviously an obsolete synonym |
| @c Which is which? |
| @item MAP_ANONYMOUS |
| @itemx MAP_ANON |
| This flag tells the system to create an anonymous mapping, not connected |
| to a file. @var{filedes} and @var{off} are ignored, and the region is |
| initialized with zeros. |
| |
| Anonymous maps are used as the basic primitive to extend the heap on some |
| systems. They are also useful to share data between multiple tasks |
| without creating a file. |
| |
| On some systems using private anonymous mmaps is more efficient than using |
| @code{malloc} for large blocks. This is not an issue with @theglibc{}, |
| as the included @code{malloc} automatically uses @code{mmap} where appropriate. |
| |
| @c Linux has some other MAP_ options, which I have not discussed here. |
| @c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to |
| @c user programs (and I don't understand the last two). MAP_LOCKED does |
| @c not appear to be implemented. |
| |
| @end vtable |
| |
| @code{mmap} returns the address of the new mapping, or |
| @code{MAP_FAILED} for an error. |
| |
| Possible errors include: |
| |
| @table @code |
| |
| @item EINVAL |
| |
| Either @var{address} was unusable, or inconsistent @var{flags} were |
| given. |
| |
| @item EACCES |
| |
| @var{filedes} was not open for the type of access specified in @var{protect}. |
| |
| @item ENOMEM |
| |
| Either there is not enough memory for the operation, or the process is |
| out of address space. |
| |
| @item ENODEV |
| |
| This file is of a type that doesn't support mapping. |
| |
| @item ENOEXEC |
| |
| The file is on a filesystem that doesn't support mapping. |
| |
| @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock. |
| @c However mandatory locks are not discussed in this manual. |
| @c |
| @c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented |
| @c here) is used and the file is already open for writing. |
| |
| @end table |
| |
| @end deftypefun |
| |
| @comment sys/mman.h |
| @comment LFS |
| @deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length}, int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| @c The page_shift auto detection when MMAP2_PAGE_SHIFT is -1 (it never |
| @c is) would be thread-unsafe. |
| The @code{mmap64} function is equivalent to the @code{mmap} function but |
| the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems |
| this allows the file associated with the @var{filedes} descriptor to be |
| larger than 2GB. @var{filedes} must be a descriptor returned from a |
| call to @code{open64} or @code{fopen64} and @code{freopen64} where the |
| descriptor is retrieved with @code{fileno}. |
| |
| When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this |
| function is actually available under the name @code{mmap}. I.e., the |
| new, extended API using 64 bit file sizes and offsets transparently |
| replaces the old API. |
| @end deftypefun |
| |
| @comment sys/mman.h |
| @comment POSIX |
| @deftypefun int munmap (void *@var{addr}, size_t @var{length}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| @code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} + |
| @var{length}). @var{length} should be the length of the mapping. |
| |
| It is safe to unmap multiple mappings in one command, or include unmapped |
| space in the range. It is also possible to unmap only part of an existing |
| mapping. However, only entire pages can be removed. If @var{length} is not |
| an even number of pages, it will be rounded up. |
| |
| It returns @math{0} for success and @math{-1} for an error. |
| |
| One error is possible: |
| |
| @table @code |
| |
| @item EINVAL |
| The memory range given was outside the user mmap range or wasn't page |
| aligned. |
| |
| @end table |
| |
| @end deftypefun |
| |
| @comment sys/mman.h |
| @comment POSIX |
| @deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| When using shared mappings, the kernel can write the file at any time |
| before the mapping is removed. To be certain data has actually been |
| written to the file and will be accessible to non-memory-mapped I/O, it |
| is necessary to use this function. |
| |
| It operates on the region @var{address} to (@var{address} + @var{length}). |
| It may be used on part of a mapping or multiple mappings, however the |
| region given should not contain any unmapped space. |
| |
| @var{flags} can contain some options: |
| |
| @vtable @code |
| |
| @item MS_SYNC |
| |
| This flag makes sure the data is actually written @emph{to disk}. |
| Normally @code{msync} only makes sure that accesses to a file with |
| conventional I/O reflect the recent changes. |
| |
| @item MS_ASYNC |
| |
| This tells @code{msync} to begin the synchronization, but not to wait for |
| it to complete. |
| |
| @c Linux also has MS_INVALIDATE, which I don't understand. |
| |
| @end vtable |
| |
| @code{msync} returns @math{0} for success and @math{-1} for |
| error. Errors include: |
| |
| @table @code |
| |
| @item EINVAL |
| An invalid region was given, or the @var{flags} were invalid. |
| |
| @item EFAULT |
| There is no existing mapping in at least part of the given region. |
| |
| @end table |
| |
| @end deftypefun |
| |
| @comment sys/mman.h |
| @comment GNU |
| @deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| This function can be used to change the size of an existing memory |
| area. @var{address} and @var{length} must cover a region entirely mapped |
| in the same @code{mmap} statement. A new mapping with the same |
| characteristics will be returned with the length @var{new_length}. |
| |
| One option is possible, @code{MREMAP_MAYMOVE}. If it is given in |
| @var{flags}, the system may remove the existing mapping and create a new |
| one of the desired length in another location. |
| |
| The address of the resulting mapping is returned, or @math{-1}. Possible |
| error codes include: |
| |
| @table @code |
| |
| @item EFAULT |
| There is no existing mapping in at least part of the original region, or |
| the region covers two or more distinct mappings. |
| |
| @item EINVAL |
| The address given is misaligned or inappropriate. |
| |
| @item EAGAIN |
| The region has pages locked, and if extended it would exceed the |
| process's resource limit for locked pages. @xref{Limits on Resources}. |
| |
| @item ENOMEM |
| The region is private writable, and insufficient virtual memory is |
| available to extend it. Also, this error will occur if |
| @code{MREMAP_MAYMOVE} is not given and the extension would collide with |
| another mapped region. |
| |
| @end table |
| @end deftypefun |
| |
| This function is only available on a few systems. Except for performing |
| optional optimizations one should not rely on this function. |
| |
| Not all file descriptors may be mapped. Sockets, pipes, and most devices |
| only allow sequential access and do not fit into the mapping abstraction. |
| In addition, some regular files may not be mmapable, and older kernels may |
| not support mapping at all. Thus, programs using @code{mmap} should |
| have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU |
| Coding Standards}. |
| |
| @comment sys/mman.h |
| @comment POSIX |
| @deftypefun int madvise (void *@var{addr}, size_t @var{length}, int @var{advice}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| This function can be used to provide the system with @var{advice} about |
| the intended usage patterns of the memory region starting at @var{addr} |
| and extending @var{length} bytes. |
| |
| The valid BSD values for @var{advice} are: |
| |
| @table @code |
| |
| @item MADV_NORMAL |
| The region should receive no further special treatment. |
| |
| @item MADV_RANDOM |
| The region will be accessed via random page references. The kernel |
| should page-in the minimal number of pages for each page fault. |
| |
| @item MADV_SEQUENTIAL |
| The region will be accessed via sequential page references. This |
| may cause the kernel to aggressively read-ahead, expecting further |
| sequential references after any page fault within this region. |
| |
| @item MADV_WILLNEED |
| The region will be needed. The pages within this region may |
| be pre-faulted in by the kernel. |
| |
| @item MADV_DONTNEED |
| The region is no longer needed. The kernel may free these pages, |
| causing any changes to the pages to be lost, as well as swapped |
| out pages to be discarded. |
| |
| @end table |
| |
| The POSIX names are slightly different, but with the same meanings: |
| |
| @table @code |
| |
| @item POSIX_MADV_NORMAL |
| This corresponds with BSD's @code{MADV_NORMAL}. |
| |
| @item POSIX_MADV_RANDOM |
| This corresponds with BSD's @code{MADV_RANDOM}. |
| |
| @item POSIX_MADV_SEQUENTIAL |
| This corresponds with BSD's @code{MADV_SEQUENTIAL}. |
| |
| @item POSIX_MADV_WILLNEED |
| This corresponds with BSD's @code{MADV_WILLNEED}. |
| |
| @item POSIX_MADV_DONTNEED |
| This corresponds with BSD's @code{MADV_DONTNEED}. |
| |
| @end table |
| |
| @code{madvise} returns @math{0} for success and @math{-1} for |
| error. Errors include: |
| @table @code |
| |
| @item EINVAL |
| An invalid region was given, or the @var{advice} was invalid. |
| |
| @item EFAULT |
| There is no existing mapping in at least part of the given region. |
| |
| @end table |
| @end deftypefun |
| |
| @comment sys/mman.h |
| @comment POSIX |
| @deftypefn Function int shm_open (const char *@var{name}, int @var{oflag}, mode_t @var{mode}) |
| @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} |
| @c shm_open @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd |
| @c libc_once(where_is_shmfs) @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd |
| @c where_is_shmfs @mtslocale @ascuheap @asulock @aculock @acsmem @acsfd |
| @c statfs dup ok |
| @c setmntent dup @ascuheap @asulock @acsmem @acsfd @aculock |
| @c getmntent_r dup @mtslocale @ascuheap @aculock @acsmem [no @asucorrupt @acucorrupt; exclusive stream] |
| @c strcmp dup ok |
| @c strlen dup ok |
| @c malloc dup @ascuheap @acsmem |
| @c mempcpy dup ok |
| @c endmntent dup @ascuheap @asulock @aculock @acsmem @acsfd |
| @c strlen dup ok |
| @c strchr dup ok |
| @c mempcpy dup ok |
| @c open dup @acsfd |
| @c fcntl dup ok |
| @c close dup @acsfd |
| |
| This function returns a file descriptor that can be used to allocate shared |
| memory via mmap. Unrelated processes can use same @var{name} to create or |
| open existing shared memory objects. |
| |
| A @var{name} argument specifies the shared memory object to be opened. |
| In @theglibc{} it must be a string smaller than @code{NAME_MAX} bytes starting |
| with an optional slash but containing no other slashes. |
| |
| The semantics of @var{oflag} and @var{mode} arguments is same as in @code{open}. |
| |
| @code{shm_open} returns the file descriptor on success or @math{-1} on error. |
| On failure @code{errno} is set. |
| @end deftypefn |
| |
| @deftypefn Function int shm_unlink (const char *@var{name}) |
| @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asuinit{} @ascuheap{} @asulock{}}@acunsafe{@aculock{} @acsmem{} @acsfd{}}} |
| @c shm_unlink @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd |
| @c libc_once(where_is_shmfs) dup @mtslocale @asuinit @ascuheap @asulock @aculock @acsmem @acsfd |
| @c strlen dup ok |
| @c strchr dup ok |
| @c mempcpy dup ok |
| @c unlink dup ok |
| |
| This function is inverse of @code{shm_open} and removes the object with |
| the given @var{name} previously created by @code{shm_open}. |
| |
| @code{shm_unlink} returns @math{0} on success or @math{-1} on error. |
| On failure @code{errno} is set. |
| @end deftypefn |
| |
| @node Waiting for I/O |
| @section Waiting for Input or Output |
| @cindex waiting for input or output |
| @cindex multiplexing input |
| @cindex input from multiple files |
| |
| Sometimes a program needs to accept input on multiple input channels |
| whenever input arrives. For example, some workstations may have devices |
| such as a digitizing tablet, function button box, or dial box that are |
| connected via normal asynchronous serial interfaces; good user interface |
| style requires responding immediately to input on any device. Another |
| example is a program that acts as a server to several other processes |
| via pipes or sockets. |
| |
| You cannot normally use @code{read} for this purpose, because this |
| blocks the program until input is available on one particular file |
| descriptor; input on other channels won't wake it up. You could set |
| nonblocking mode and poll each file descriptor in turn, but this is very |
| inefficient. |
| |
| A better solution is to use the @code{select} function. This blocks the |
| program until input or output is ready on a specified set of file |
| descriptors, or until a timer expires, whichever comes first. This |
| facility is declared in the header file @file{sys/types.h}. |
| @pindex sys/types.h |
| |
| In the case of a server socket (@pxref{Listening}), we say that |
| ``input'' is available when there are pending connections that could be |
| accepted (@pxref{Accepting Connections}). @code{accept} for server |
| sockets blocks and interacts with @code{select} just as @code{read} does |
| for normal input. |
| |
| @cindex file descriptor sets, for @code{select} |
| The file descriptor sets for the @code{select} function are specified |
| as @code{fd_set} objects. Here is the description of the data type |
| and some macros for manipulating these objects. |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftp {Data Type} fd_set |
| The @code{fd_set} data type represents file descriptor sets for the |
| @code{select} function. It is actually a bit array. |
| @end deftp |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypevr Macro int FD_SETSIZE |
| The value of this macro is the maximum number of file descriptors that a |
| @code{fd_set} object can hold information about. On systems with a |
| fixed maximum number, @code{FD_SETSIZE} is at least that number. On |
| some systems, including GNU, there is no absolute limit on the number of |
| descriptors open, but this macro still has a constant value which |
| controls the number of bits in an @code{fd_set}; if you get a file |
| descriptor with a value as high as @code{FD_SETSIZE}, you cannot put |
| that descriptor into an @code{fd_set}. |
| @end deftypevr |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypefn Macro void FD_ZERO (fd_set *@var{set}) |
| @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
| This macro initializes the file descriptor set @var{set} to be the |
| empty set. |
| @end deftypefn |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set}) |
| @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
| @c Setting a bit isn't necessarily atomic, so there's a potential race |
| @c here if set is not used exclusively. |
| This macro adds @var{filedes} to the file descriptor set @var{set}. |
| |
| The @var{filedes} parameter must not have side effects since it is |
| evaluated more than once. |
| @end deftypefn |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set}) |
| @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
| @c Setting a bit isn't necessarily atomic, so there's a potential race |
| @c here if set is not used exclusively. |
| This macro removes @var{filedes} from the file descriptor set @var{set}. |
| |
| The @var{filedes} parameter must not have side effects since it is |
| evaluated more than once. |
| @end deftypefn |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypefn Macro int FD_ISSET (int @var{filedes}, const fd_set *@var{set}) |
| @safety{@prelim{}@mtsafe{@mtsrace{:set}}@assafe{}@acsafe{}} |
| This macro returns a nonzero value (true) if @var{filedes} is a member |
| of the file descriptor set @var{set}, and zero (false) otherwise. |
| |
| The @var{filedes} parameter must not have side effects since it is |
| evaluated more than once. |
| @end deftypefn |
| |
| Next, here is the description of the @code{select} function itself. |
| |
| @comment sys/types.h |
| @comment BSD |
| @deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout}) |
| @safety{@prelim{}@mtsafe{@mtsrace{:read-fds} @mtsrace{:write-fds} @mtsrace{:except-fds}}@assafe{}@acsafe{}} |
| @c The select syscall is preferred, but pselect6 may be used instead, |
| @c which requires converting timeout to a timespec and back. The |
| @c conversions are not atomic. |
| The @code{select} function blocks the calling process until there is |
| activity on any of the specified sets of file descriptors, or until the |
| timeout period has expired. |
| |
| The file descriptors specified by the @var{read-fds} argument are |
| checked to see if they are ready for reading; the @var{write-fds} file |
| descriptors are checked to see if they are ready for writing; and the |
| @var{except-fds} file descriptors are checked for exceptional |
| conditions. You can pass a null pointer for any of these arguments if |
| you are not interested in checking for that kind of condition. |
| |
| A file descriptor is considered ready for reading if a @code{read} |
| call will not block. This usually includes the read offset being at |
| the end of the file or there is an error to report. A server socket |
| is considered ready for reading if there is a pending connection which |
| can be accepted with @code{accept}; @pxref{Accepting Connections}. A |
| client socket is ready for writing when its connection is fully |
| established; @pxref{Connecting}. |
| |
| ``Exceptional conditions'' does not mean errors---errors are reported |
| immediately when an erroneous system call is executed, and do not |
| constitute a state of the descriptor. Rather, they include conditions |
| such as the presence of an urgent message on a socket. (@xref{Sockets}, |
| for information on urgent messages.) |
| |
| The @code{select} function checks only the first @var{nfds} file |
| descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value |
| of this argument. |
| |
| The @var{timeout} specifies the maximum time to wait. If you pass a |
| null pointer for this argument, it means to block indefinitely until one |
| of the file descriptors is ready. Otherwise, you should provide the |
| time in @code{struct timeval} format; see @ref{High-Resolution |
| Calendar}. Specify zero as the time (a @code{struct timeval} containing |
| all zeros) if you want to find out which descriptors are ready without |
| waiting if none are ready. |
| |
| The normal return value from @code{select} is the total number of ready file |
| descriptors in all of the sets. Each of the argument sets is overwritten |
| with information about the descriptors that are ready for the corresponding |
| operation. Thus, to see if a particular descriptor @var{desc} has input, |
| use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns. |
| |
| If @code{select} returns because the timeout period expires, it returns |
| a value of zero. |
| |
| Any signal will cause @code{select} to return immediately. So if your |
| program uses signals, you can't rely on @code{select} to keep waiting |
| for the full time specified. If you want to be sure of waiting for a |
| particular amount of time, you must check for @code{EINTR} and repeat |
| the @code{select} with a newly calculated timeout based on the current |
| time. See the example below. See also @ref{Interrupted Primitives}. |
| |
| If an error occurs, @code{select} returns @code{-1} and does not modify |
| the argument file descriptor sets. The following @code{errno} error |
| conditions are defined for this function: |
| |
| @table @code |
| @item EBADF |
| One of the file descriptor sets specified an invalid file descriptor. |
| |
| @item EINTR |
| The operation was interrupted by a signal. @xref{Interrupted Primitives}. |
| |
| @item EINVAL |
| The @var{timeout} argument is invalid; one of the components is negative |
| or too large. |
| @end table |
| @end deftypefun |
| |
| @strong{Portability Note:} The @code{select} function is a BSD Unix |
| feature. |
| |
| Here is an example showing how you can use @code{select} to establish a |
| timeout period for reading from a file descriptor. The @code{input_timeout} |
| function blocks the calling process until input is available on the |
| file descriptor, or until the timeout period expires. |
| |
| @smallexample |
| @include select.c.texi |
| @end smallexample |
| |
| There is another example showing the use of @code{select} to multiplex |
| input from multiple sockets in @ref{Server Example}. |
| |
| |
| @node Synchronizing I/O |
| @section Synchronizing I/O operations |
| |
| @cindex synchronizing |
| In most modern operating systems, the normal I/O operations are not |
| executed synchronously. I.e., even if a @code{write} system call |
| returns, this does not mean the data is actually written to the media, |
| e.g., the disk. |
| |
| In situations where synchronization points are necessary, you can use |
| special functions which ensure that all operations finish before |
| they return. |
| |
| @comment unistd.h |
| @comment X/Open |
| @deftypefun void sync (void) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| A call to this function will not return as long as there is data which |
| has not been written to the device. All dirty buffers in the kernel will |
| be written and so an overall consistent system can be achieved (if no |
| other process in parallel writes data). |
| |
| A prototype for @code{sync} can be found in @file{unistd.h}. |
| @end deftypefun |
| |
| Programs more often want to ensure that data written to a given file is |
| committed, rather than all data in the system. For this, @code{sync} is overkill. |
| |
| |
| @comment unistd.h |
| @comment POSIX |
| @deftypefun int fsync (int @var{fildes}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{fsync} function can be used to make sure all data associated with |
| the open file @var{fildes} is written to the device associated with the |
| descriptor. The function call does not return unless all actions have |
| finished. |
| |
| A prototype for @code{fsync} can be found in @file{unistd.h}. |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{fsync} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this, calls to @code{fsync} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| |
| The return value of the function is zero if no error occurred. Otherwise |
| it is @math{-1} and the global variable @var{errno} is set to the |
| following values: |
| @table @code |
| @item EBADF |
| The descriptor @var{fildes} is not valid. |
| |
| @item EINVAL |
| No synchronization is possible since the system does not implement this. |
| @end table |
| @end deftypefun |
| |
| Sometimes it is not even necessary to write all data associated with a |
| file descriptor. E.g., in database files which do not change in size it |
| is enough to write all the file content data to the device. |
| Meta-information, like the modification time etc., are not that important |
| and leaving such information uncommitted does not prevent a successful |
| recovering of the file in case of a problem. |
| |
| @comment unistd.h |
| @comment POSIX |
| @deftypefun int fdatasync (int @var{fildes}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| When a call to the @code{fdatasync} function returns, it is ensured |
| that all of the file data is written to the device. For all pending I/O |
| operations, the parts guaranteeing data integrity finished. |
| |
| Not all systems implement the @code{fdatasync} operation. On systems |
| missing this functionality @code{fdatasync} is emulated by a call to |
| @code{fsync} since the performed actions are a superset of those |
| required by @code{fdatasync}. |
| |
| The prototype for @code{fdatasync} is in @file{unistd.h}. |
| |
| The return value of the function is zero if no error occurred. Otherwise |
| it is @math{-1} and the global variable @var{errno} is set to the |
| following values: |
| @table @code |
| @item EBADF |
| The descriptor @var{fildes} is not valid. |
| |
| @item EINVAL |
| No synchronization is possible since the system does not implement this. |
| @end table |
| @end deftypefun |
| |
| |
| @node Asynchronous I/O |
| @section Perform I/O Operations in Parallel |
| |
| The POSIX.1b standard defines a new set of I/O operations which can |
| significantly reduce the time an application spends waiting at I/O. The |
| new functions allow a program to initiate one or more I/O operations and |
| then immediately resume normal work while the I/O operations are |
| executed in parallel. This functionality is available if the |
| @file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}. |
| |
| These functions are part of the library with realtime functions named |
| @file{librt}. They are not actually part of the @file{libc} binary. |
| The implementation of these functions can be done using support in the |
| kernel (if available) or using an implementation based on threads at |
| userlevel. In the latter case it might be necessary to link applications |
| with the thread library @file{libpthread} in addition to @file{librt}. |
| |
| All AIO operations operate on files which were opened previously. There |
| might be arbitrarily many operations running for one file. The |
| asynchronous I/O operations are controlled using a data structure named |
| @code{struct aiocb} (@dfn{AIO control block}). It is defined in |
| @file{aio.h} as follows. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftp {Data Type} {struct aiocb} |
| The POSIX.1b standard mandates that the @code{struct aiocb} structure |
| contains at least the members described in the following table. There |
| might be more elements which are used by the implementation, but |
| depending upon these elements is not portable and is highly deprecated. |
| |
| @table @code |
| @item int aio_fildes |
| This element specifies the file descriptor to be used for the |
| operation. It must be a legal descriptor, otherwise the operation will |
| fail. |
| |
| The device on which the file is opened must allow the seek operation. |
| I.e., it is not possible to use any of the AIO operations on devices |
| like terminals where an @code{lseek} call would lead to an error. |
| |
| @item off_t aio_offset |
| This element specifies the offset in the file at which the operation (input |
| or output) is performed. Since the operations are carried out in arbitrary |
| order and more than one operation for one file descriptor can be |
| started, one cannot expect a current read/write position of the file |
| descriptor. |
| |
| @item volatile void *aio_buf |
| This is a pointer to the buffer with the data to be written or the place |
| where the read data is stored. |
| |
| @item size_t aio_nbytes |
| This element specifies the length of the buffer pointed to by @code{aio_buf}. |
| |
| @item int aio_reqprio |
| If the platform has defined @code{_POSIX_PRIORITIZED_IO} and |
| @code{_POSIX_PRIORITY_SCHEDULING}, the AIO requests are |
| processed based on the current scheduling priority. The |
| @code{aio_reqprio} element can then be used to lower the priority of the |
| AIO operation. |
| |
| @item struct sigevent aio_sigevent |
| This element specifies how the calling process is notified once the |
| operation terminates. If the @code{sigev_notify} element is |
| @code{SIGEV_NONE}, no notification is sent. If it is @code{SIGEV_SIGNAL}, |
| the signal determined by @code{sigev_signo} is sent. Otherwise, |
| @code{sigev_notify} must be @code{SIGEV_THREAD}. In this case, a thread |
| is created which starts executing the function pointed to by |
| @code{sigev_notify_function}. |
| |
| @item int aio_lio_opcode |
| This element is only used by the @code{lio_listio} and |
| @code{lio_listio64} functions. Since these functions allow an |
| arbitrary number of operations to start at once, and each operation can be |
| input or output (or nothing), the information must be stored in the |
| control block. The possible values are: |
| |
| @vtable @code |
| @item LIO_READ |
| Start a read operation. Read from the file at position |
| @code{aio_offset} and store the next @code{aio_nbytes} bytes in the |
| buffer pointed to by @code{aio_buf}. |
| |
| @item LIO_WRITE |
| Start a write operation. Write @code{aio_nbytes} bytes starting at |
| @code{aio_buf} into the file starting at position @code{aio_offset}. |
| |
| @item LIO_NOP |
| Do nothing for this control block. This value is useful sometimes when |
| an array of @code{struct aiocb} values contains holes, i.e., some of the |
| values must not be handled although the whole array is presented to the |
| @code{lio_listio} function. |
| @end vtable |
| @end table |
| |
| When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a |
| 32 bit machine, this type is in fact @code{struct aiocb64}, since the LFS |
| interface transparently replaces the @code{struct aiocb} definition. |
| @end deftp |
| |
| For use with the AIO functions defined in the LFS, there is a similar type |
| defined which replaces the types of the appropriate members with larger |
| types but otherwise is equivalent to @code{struct aiocb}. Particularly, |
| all member names are the same. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftp {Data Type} {struct aiocb64} |
| @table @code |
| @item int aio_fildes |
| This element specifies the file descriptor which is used for the |
| operation. It must be a legal descriptor since otherwise the operation |
| fails for obvious reasons. |
| |
| The device on which the file is opened must allow the seek operation. |
| I.e., it is not possible to use any of the AIO operations on devices |
| like terminals where an @code{lseek} call would lead to an error. |
| |
| @item off64_t aio_offset |
| This element specifies at which offset in the file the operation (input |
| or output) is performed. Since the operation are carried in arbitrary |
| order and more than one operation for one file descriptor can be |
| started, one cannot expect a current read/write position of the file |
| descriptor. |
| |
| @item volatile void *aio_buf |
| This is a pointer to the buffer with the data to be written or the place |
| where the read data is stored. |
| |
| @item size_t aio_nbytes |
| This element specifies the length of the buffer pointed to by @code{aio_buf}. |
| |
| @item int aio_reqprio |
| If for the platform @code{_POSIX_PRIORITIZED_IO} and |
| @code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are |
| processed based on the current scheduling priority. The |
| @code{aio_reqprio} element can then be used to lower the priority of the |
| AIO operation. |
| |
| @item struct sigevent aio_sigevent |
| This element specifies how the calling process is notified once the |
| operation terminates. If the @code{sigev_notify}, element is |
| @code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL}, |
| the signal determined by @code{sigev_signo} is sent. Otherwise, |
| @code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread |
| which starts executing the function pointed to by |
| @code{sigev_notify_function}. |
| |
| @item int aio_lio_opcode |
| This element is only used by the @code{lio_listio} and |
| @code{[lio_listio64} functions. Since these functions allow an |
| arbitrary number of operations to start at once, and since each operation can be |
| input or output (or nothing), the information must be stored in the |
| control block. See the description of @code{struct aiocb} for a description |
| of the possible values. |
| @end table |
| |
| When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a |
| 32 bit machine, this type is available under the name @code{struct |
| aiocb64}, since the LFS transparently replaces the old interface. |
| @end deftp |
| |
| @menu |
| * Asynchronous Reads/Writes:: Asynchronous Read and Write Operations. |
| * Status of AIO Operations:: Getting the Status of AIO Operations. |
| * Synchronizing AIO Operations:: Getting into a consistent state. |
| * Cancel AIO Operations:: Cancellation of AIO Operations. |
| * Configuration of AIO:: How to optimize the AIO implementation. |
| @end menu |
| |
| @node Asynchronous Reads/Writes |
| @subsection Asynchronous Read and Write Operations |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_read (struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| @c Calls aio_enqueue_request. |
| @c aio_enqueue_request @asulock @ascuheap @aculock @acsmem |
| @c pthread_self ok |
| @c pthread_getschedparam @asulock @aculock |
| @c lll_lock (pthread descriptor's lock) @asulock @aculock |
| @c sched_getparam ok |
| @c sched_getscheduler ok |
| @c lll_unlock @aculock |
| @c pthread_mutex_lock (aio_requests_mutex) @asulock @aculock |
| @c get_elem @ascuheap @acsmem [@asucorrupt @acucorrupt] |
| @c realloc @ascuheap @acsmem |
| @c calloc @ascuheap @acsmem |
| @c aio_create_helper_thread @asulock @ascuheap @aculock @acsmem |
| @c pthread_attr_init ok |
| @c pthread_attr_setdetachstate ok |
| @c pthread_get_minstack ok |
| @c pthread_attr_setstacksize ok |
| @c sigfillset ok |
| @c memset ok |
| @c sigdelset ok |
| @c SYSCALL rt_sigprocmask ok |
| @c pthread_create @asulock @ascuheap @aculock @acsmem |
| @c lll_lock (default_pthread_attr_lock) @asulock @aculock |
| @c alloca/malloc @ascuheap @acsmem |
| @c lll_unlock @aculock |
| @c allocate_stack @asulock @ascuheap @aculock @acsmem |
| @c getpagesize dup |
| @c lll_lock (default_pthread_attr_lock) @asulock @aculock |
| @c lll_unlock @aculock |
| @c _dl_allocate_tls @ascuheap @acsmem |
| @c _dl_allocate_tls_storage @ascuheap @acsmem |
| @c memalign @ascuheap @acsmem |
| @c memset ok |
| @c allocate_dtv dup |
| @c free @ascuheap @acsmem |
| @c allocate_dtv @ascuheap @acsmem |
| @c calloc @ascuheap @acsmem |
| @c INSTALL_DTV ok |
| @c list_add dup |
| @c get_cached_stack |
| @c lll_lock (stack_cache_lock) @asulock @aculock |
| @c list_for_each ok |
| @c list_entry dup |
| @c FREE_P dup |
| @c stack_list_del dup |
| @c stack_list_add dup |
| @c lll_unlock @aculock |
| @c _dl_allocate_tls_init ok |
| @c GET_DTV ok |
| @c mmap ok |
| @c atomic_increment_val ok |
| @c munmap ok |
| @c change_stack_perm ok |
| @c mprotect ok |
| @c mprotect ok |
| @c stack_list_del dup |
| @c _dl_deallocate_tls dup |
| @c munmap ok |
| @c THREAD_COPY_STACK_GUARD ok |
| @c THREAD_COPY_POINTER_GUARD ok |
| @c atomic_exchange_acq ok |
| @c lll_futex_wake ok |
| @c deallocate_stack @asulock @ascuheap @aculock @acsmem |
| @c lll_lock (state_cache_lock) @asulock @aculock |
| @c stack_list_del ok |
| @c atomic_write_barrier ok |
| @c list_del ok |
| @c atomic_write_barrier ok |
| @c queue_stack @ascuheap @acsmem |
| @c stack_list_add ok |
| @c atomic_write_barrier ok |
| @c list_add ok |
| @c atomic_write_barrier ok |
| @c free_stacks @ascuheap @acsmem |
| @c list_for_each_prev_safe ok |
| @c list_entry ok |
| @c FREE_P ok |
| @c stack_list_del dup |
| @c _dl_deallocate_tls dup |
| @c munmap ok |
| @c _dl_deallocate_tls @ascuheap @acsmem |
| @c free @ascuheap @acsmem |
| @c lll_unlock @aculock |
| @c create_thread @asulock @ascuheap @aculock @acsmem |
| @c td_eventword |
| @c td_eventmask |
| @c do_clone @asulock @ascuheap @aculock @acsmem |
| @c PREPARE_CREATE ok |
| @c lll_lock (pd->lock) @asulock @aculock |
| @c atomic_increment ok |
| @c clone ok |
| @c atomic_decrement ok |
| @c atomic_exchange_acq ok |
| @c lll_futex_wake ok |
| @c deallocate_stack dup |
| @c sched_setaffinity ok |
| @c tgkill ok |
| @c sched_setscheduler ok |
| @c atomic_compare_and_exchange_bool_acq ok |
| @c nptl_create_event ok |
| @c lll_unlock (pd->lock) @aculock |
| @c free @ascuheap @acsmem |
| @c pthread_attr_destroy ok (cpuset won't be set, so free isn't called) |
| @c add_request_to_runlist ok |
| @c pthread_cond_signal ok |
| @c aio_free_request ok |
| @c pthread_mutex_unlock @aculock |
| |
| @c (in the new thread, initiated with clone) |
| @c start_thread ok |
| @c HP_TIMING_NOW ok |
| @c ctype_init @mtslocale |
| @c atomic_exchange_acq ok |
| @c lll_futex_wake ok |
| @c sigemptyset ok |
| @c sigaddset ok |
| @c setjmp ok |
| @c CANCEL_ASYNC -> pthread_enable_asynccancel ok |
| @c do_cancel ok |
| @c pthread_unwind ok |
| @c Unwind_ForcedUnwind or longjmp ok [@ascuheap @acsmem?] |
| @c lll_lock @asulock @aculock |
| @c lll_unlock @asulock @aculock |
| @c CANCEL_RESET -> pthread_disable_asynccancel ok |
| @c lll_futex_wait ok |
| @c ->start_routine ok ----- |
| @c call_tls_dtors @asulock @ascuheap @aculock @acsmem |
| @c user-supplied dtor |
| @c rtld_lock_lock_recursive (dl_load_lock) @asulock @aculock |
| @c rtld_lock_unlock_recursive @aculock |
| @c free @ascuheap @acsmem |
| @c nptl_deallocate_tsd @ascuheap @acsmem |
| @c tsd user-supplied dtors ok |
| @c free @ascuheap @acsmem |
| @c libc_thread_freeres |
| @c libc_thread_subfreeres ok |
| @c atomic_decrement_and_test ok |
| @c td_eventword ok |
| @c td_eventmask ok |
| @c atomic_compare_exchange_bool_acq ok |
| @c nptl_death_event ok |
| @c lll_robust_dead ok |
| @c getpagesize ok |
| @c madvise ok |
| @c free_tcb @asulock @ascuheap @aculock @acsmem |
| @c free @ascuheap @acsmem |
| @c deallocate_stack @asulock @ascuheap @aculock @acsmem |
| @c lll_futex_wait ok |
| @c exit_thread_inline ok |
| @c syscall(exit) ok |
| |
| This function initiates an asynchronous read operation. It |
| immediately returns after the operation was enqueued or when an |
| error was encountered. |
| |
| The first @code{aiocbp->aio_nbytes} bytes of the file for which |
| @code{aiocbp->aio_fildes} is a descriptor are written to the buffer |
| starting at @code{aiocbp->aio_buf}. Reading starts at the absolute |
| position @code{aiocbp->aio_offset} in the file. |
| |
| If prioritized I/O is supported by the platform the |
| @code{aiocbp->aio_reqprio} value is used to adjust the priority before |
| the request is actually enqueued. |
| |
| The calling process is notified about the termination of the read |
| request according to the @code{aiocbp->aio_sigevent} value. |
| |
| When @code{aio_read} returns, the return value is zero if no error |
| occurred that can be found before the process is enqueued. If such an |
| early error is found, the function returns @math{-1} and sets |
| @code{errno} to one of the following values: |
| |
| @table @code |
| @item EAGAIN |
| The request was not enqueued due to (temporarily) exceeded resource |
| limitations. |
| @item ENOSYS |
| The @code{aio_read} function is not implemented. |
| @item EBADF |
| The @code{aiocbp->aio_fildes} descriptor is not valid. This condition |
| need not be recognized before enqueueing the request and so this error |
| might also be signaled asynchronously. |
| @item EINVAL |
| The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is |
| invalid. This condition need not be recognized before enqueueing the |
| request and so this error might also be signaled asynchronously. |
| @end table |
| |
| If @code{aio_read} returns zero, the current status of the request |
| can be queried using @code{aio_error} and @code{aio_return} functions. |
| As long as the value returned by @code{aio_error} is @code{EINPROGRESS} |
| the operation has not yet completed. If @code{aio_error} returns zero, |
| the operation successfully terminated, otherwise the value is to be |
| interpreted as an error code. If the function terminated, the result of |
| the operation can be obtained using a call to @code{aio_return}. The |
| returned value is the same as an equivalent call to @code{read} would |
| have returned. Possible error codes returned by @code{aio_error} are: |
| |
| @table @code |
| @item EBADF |
| The @code{aiocbp->aio_fildes} descriptor is not valid. |
| @item ECANCELED |
| The operation was canceled before the operation was finished |
| (@pxref{Cancel AIO Operations}) |
| @item EINVAL |
| The @code{aiocbp->aio_offset} value is invalid. |
| @end table |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is in fact @code{aio_read64} since the LFS interface transparently |
| replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_read64 (struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function is similar to the @code{aio_read} function. The only |
| difference is that on @w{32 bit} machines, the file descriptor should |
| be opened in the large file mode. Internally, @code{aio_read64} uses |
| functionality equivalent to @code{lseek64} (@pxref{File Position |
| Primitive}) to position the file descriptor correctly for the reading, |
| as opposed to @code{lseek} functionality used in @code{aio_read}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is available under the name @code{aio_read} and so transparently |
| replaces the interface for small files on 32 bit machines. |
| @end deftypefun |
| |
| To write data asynchronously to a file, there exists an equivalent pair |
| of functions with a very similar interface. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_write (struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function initiates an asynchronous write operation. The function |
| call immediately returns after the operation was enqueued or if before |
| this happens an error was encountered. |
| |
| The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at |
| @code{aiocbp->aio_buf} are written to the file for which |
| @code{aiocbp->aio_fildes} is a descriptor, starting at the absolute |
| position @code{aiocbp->aio_offset} in the file. |
| |
| If prioritized I/O is supported by the platform, the |
| @code{aiocbp->aio_reqprio} value is used to adjust the priority before |
| the request is actually enqueued. |
| |
| The calling process is notified about the termination of the read |
| request according to the @code{aiocbp->aio_sigevent} value. |
| |
| When @code{aio_write} returns, the return value is zero if no error |
| occurred that can be found before the process is enqueued. If such an |
| early error is found the function returns @math{-1} and sets |
| @code{errno} to one of the following values. |
| |
| @table @code |
| @item EAGAIN |
| The request was not enqueued due to (temporarily) exceeded resource |
| limitations. |
| @item ENOSYS |
| The @code{aio_write} function is not implemented. |
| @item EBADF |
| The @code{aiocbp->aio_fildes} descriptor is not valid. This condition |
| may not be recognized before enqueueing the request, and so this error |
| might also be signaled asynchronously. |
| @item EINVAL |
| The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqprio} value is |
| invalid. This condition may not be recognized before enqueueing the |
| request and so this error might also be signaled asynchronously. |
| @end table |
| |
| In the case @code{aio_write} returns zero, the current status of the |
| request can be queried using @code{aio_error} and @code{aio_return} |
| functions. As long as the value returned by @code{aio_error} is |
| @code{EINPROGRESS} the operation has not yet completed. If |
| @code{aio_error} returns zero, the operation successfully terminated, |
| otherwise the value is to be interpreted as an error code. If the |
| function terminated, the result of the operation can be get using a call |
| to @code{aio_return}. The returned value is the same as an equivalent |
| call to @code{read} would have returned. Possible error codes returned |
| by @code{aio_error} are: |
| |
| @table @code |
| @item EBADF |
| The @code{aiocbp->aio_fildes} descriptor is not valid. |
| @item ECANCELED |
| The operation was canceled before the operation was finished. |
| (@pxref{Cancel AIO Operations}) |
| @item EINVAL |
| The @code{aiocbp->aio_offset} value is invalid. |
| @end table |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is in fact @code{aio_write64} since the LFS interface transparently |
| replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_write64 (struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function is similar to the @code{aio_write} function. The only |
| difference is that on @w{32 bit} machines the file descriptor should |
| be opened in the large file mode. Internally @code{aio_write64} uses |
| functionality equivalent to @code{lseek64} (@pxref{File Position |
| Primitive}) to position the file descriptor correctly for the writing, |
| as opposed to @code{lseek} functionality used in @code{aio_write}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is available under the name @code{aio_write} and so transparently |
| replaces the interface for small files on 32 bit machines. |
| @end deftypefun |
| |
| Besides these functions with the more or less traditional interface, |
| POSIX.1b also defines a function which can initiate more than one |
| operation at a time, and which can handle freely mixed read and write |
| operations. It is therefore similar to a combination of @code{readv} and |
| @code{writev}. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| @c Call lio_listio_internal, that takes the aio_requests_mutex lock and |
| @c enqueues each request. Then, it waits for notification or prepares |
| @c for it before releasing the lock. Even though it performs memory |
| @c allocation and locking of its own, it doesn't add any classes of |
| @c safety issues that aren't already covered by aio_enqueue_request. |
| The @code{lio_listio} function can be used to enqueue an arbitrary |
| number of read and write requests at one time. The requests can all be |
| meant for the same file, all for different files or every solution in |
| between. |
| |
| @code{lio_listio} gets the @var{nent} requests from the array pointed to |
| by @var{list}. The operation to be performed is determined by the |
| @code{aio_lio_opcode} member in each element of @var{list}. If this |
| field is @code{LIO_READ} a read operation is enqueued, similar to a call |
| of @code{aio_read} for this element of the array (except that the way |
| the termination is signalled is different, as we will see below). If |
| the @code{aio_lio_opcode} member is @code{LIO_WRITE} a write operation |
| is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP} |
| in which case this element of @var{list} is simply ignored. This |
| ``operation'' is useful in situations where one has a fixed array of |
| @code{struct aiocb} elements from which only a few need to be handled at |
| a time. Another situation is where the @code{lio_listio} call was |
| canceled before all requests are processed (@pxref{Cancel AIO |
| Operations}) and the remaining requests have to be reissued. |
| |
| The other members of each element of the array pointed to by |
| @code{list} must have values suitable for the operation as described in |
| the documentation for @code{aio_read} and @code{aio_write} above. |
| |
| The @var{mode} argument determines how @code{lio_listio} behaves after |
| having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it |
| waits until all requests terminated. Otherwise @var{mode} must be |
| @code{LIO_NOWAIT} and in this case the function returns immediately after |
| having enqueued all the requests. In this case the caller gets a |
| notification of the termination of all requests according to the |
| @var{sig} parameter. If @var{sig} is @code{NULL} no notification is |
| send. Otherwise a signal is sent or a thread is started, just as |
| described in the description for @code{aio_read} or @code{aio_write}. |
| |
| If @var{mode} is @code{LIO_WAIT}, the return value of @code{lio_listio} |
| is @math{0} when all requests completed successfully. Otherwise the |
| function return @math{-1} and @code{errno} is set accordingly. To find |
| out which request or requests failed one has to use the @code{aio_error} |
| function on all the elements of the array @var{list}. |
| |
| In case @var{mode} is @code{LIO_NOWAIT}, the function returns @math{0} if |
| all requests were enqueued correctly. The current state of the requests |
| can be found using @code{aio_error} and @code{aio_return} as described |
| above. If @code{lio_listio} returns @math{-1} in this mode, the |
| global variable @code{errno} is set accordingly. If a request did not |
| yet terminate, a call to @code{aio_error} returns @code{EINPROGRESS}. If |
| the value is different, the request is finished and the error value (or |
| @math{0}) is returned and the result of the operation can be retrieved |
| using @code{aio_return}. |
| |
| Possible values for @code{errno} are: |
| |
| @table @code |
| @item EAGAIN |
| The resources necessary to queue all the requests are not available at |
| the moment. The error status for each element of @var{list} must be |
| checked to determine which request failed. |
| |
| Another reason could be that the system wide limit of AIO requests is |
| exceeded. This cannot be the case for the implementation on @gnusystems{} |
| since no arbitrary limits exist. |
| @item EINVAL |
| The @var{mode} parameter is invalid or @var{nent} is larger than |
| @code{AIO_LISTIO_MAX}. |
| @item EIO |
| One or more of the request's I/O operations failed. The error status of |
| each request should be checked to determine which one failed. |
| @item ENOSYS |
| The @code{lio_listio} function is not supported. |
| @end table |
| |
| If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels |
| a request, the error status for this request returned by |
| @code{aio_error} is @code{ECANCELED}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is in fact @code{lio_listio64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int lio_listio64 (int @var{mode}, struct aiocb64 *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function is similar to the @code{lio_listio} function. The only |
| difference is that on @w{32 bit} machines, the file descriptor should |
| be opened in the large file mode. Internally, @code{lio_listio64} uses |
| functionality equivalent to @code{lseek64} (@pxref{File Position |
| Primitive}) to position the file descriptor correctly for the reading or |
| writing, as opposed to @code{lseek} functionality used in |
| @code{lio_listio}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is available under the name @code{lio_listio} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| @node Status of AIO Operations |
| @subsection Getting the Status of AIO Operations |
| |
| As already described in the documentation of the functions in the last |
| section, it must be possible to get information about the status of an I/O |
| request. When the operation is performed truly asynchronously (as with |
| @code{aio_read} and @code{aio_write} and with @code{lio_listio} when the |
| mode is @code{LIO_NOWAIT}), one sometimes needs to know whether a |
| specific request already terminated and if so, what the result was. |
| The following two functions allow you to get this kind of information. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_error (const struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function determines the error state of the request described by the |
| @code{struct aiocb} variable pointed to by @var{aiocbp}. If the |
| request has not yet terminated the value returned is always |
| @code{EINPROGRESS}. Once the request has terminated the value |
| @code{aio_error} returns is either @math{0} if the request completed |
| successfully or it returns the value which would be stored in the |
| @code{errno} variable if the request would have been done using |
| @code{read}, @code{write}, or @code{fsync}. |
| |
| The function can return @code{ENOSYS} if it is not implemented. It |
| could also return @code{EINVAL} if the @var{aiocbp} parameter does not |
| refer to an asynchronous operation whose return status is not yet known. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is in fact @code{aio_error64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function is similar to @code{aio_error} with the only difference |
| that the argument is a reference to a variable of type @code{struct |
| aiocb64}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is available under the name @code{aio_error} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun ssize_t aio_return (struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function can be used to retrieve the return status of the operation |
| carried out by the request described in the variable pointed to by |
| @var{aiocbp}. As long as the error status of this request as returned |
| by @code{aio_error} is @code{EINPROGRESS} the return of this function is |
| undefined. |
| |
| Once the request is finished this function can be used exactly once to |
| retrieve the return value. Following calls might lead to undefined |
| behavior. The return value itself is the value which would have been |
| returned by the @code{read}, @code{write}, or @code{fsync} call. |
| |
| The function can return @code{ENOSYS} if it is not implemented. It |
| could also return @code{EINVAL} if the @var{aiocbp} parameter does not |
| refer to an asynchronous operation whose return status is not yet known. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is in fact @code{aio_return64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun ssize_t aio_return64 (struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function is similar to @code{aio_return} with the only difference |
| that the argument is a reference to a variable of type @code{struct |
| aiocb64}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is available under the name @code{aio_return} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| @node Synchronizing AIO Operations |
| @subsection Getting into a Consistent State |
| |
| When dealing with asynchronous operations it is sometimes necessary to |
| get into a consistent state. This would mean for AIO that one wants to |
| know whether a certain request or a group of request were processed. |
| This could be done by waiting for the notification sent by the system |
| after the operation terminated, but this sometimes would mean wasting |
| resources (mainly computation time). Instead POSIX.1b defines two |
| functions which will help with most kinds of consistency. |
| |
| The @code{aio_fsync} and @code{aio_fsync64} functions are only available |
| if the symbol @code{_POSIX_SYNCHRONIZED_IO} is defined in @file{unistd.h}. |
| |
| @cindex synchronizing |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| @c After fcntl to check that the FD is open, it calls |
| @c aio_enqueue_request. |
| Calling this function forces all I/O operations operating queued at the |
| time of the function call operating on the file descriptor |
| @code{aiocbp->aio_fildes} into the synchronized I/O completion state |
| (@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns |
| immediately but the notification through the method described in |
| @code{aiocbp->aio_sigevent} will happen only after all requests for this |
| file descriptor have terminated and the file is synchronized. This also |
| means that requests for this very same file descriptor which are queued |
| after the synchronization request are not affected. |
| |
| If @var{op} is @code{O_DSYNC} the synchronization happens as with a call |
| to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and |
| the synchronization happens as with @code{fsync}. |
| |
| As long as the synchronization has not happened, a call to |
| @code{aio_error} with the reference to the object pointed to by |
| @var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is |
| done @code{aio_error} return @math{0} if the synchronization was not |
| successful. Otherwise the value returned is the value to which the |
| @code{fsync} or @code{fdatasync} function would have set the |
| @code{errno} variable. In this case nothing can be assumed about the |
| consistency for the data written to this file descriptor. |
| |
| The return value of this function is @math{0} if the request was |
| successfully enqueued. Otherwise the return value is @math{-1} and |
| @code{errno} is set to one of the following values: |
| |
| @table @code |
| @item EAGAIN |
| The request could not be enqueued due to temporary lack of resources. |
| @item EBADF |
| The file descriptor @code{@var{aiocbp}->aio_fildes} is not valid. |
| @item EINVAL |
| The implementation does not support I/O synchronization or the @var{op} |
| parameter is other than @code{O_DSYNC} and @code{O_SYNC}. |
| @item ENOSYS |
| This function is not implemented. |
| @end table |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is in fact @code{aio_fsync64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function is similar to @code{aio_fsync} with the only difference |
| that the argument is a reference to a variable of type @code{struct |
| aiocb64}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is available under the name @code{aio_fsync} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| Another method of synchronization is to wait until one or more requests of a |
| specific set terminated. This could be achieved by the @code{aio_*} |
| functions to notify the initiating process about the termination but in |
| some situations this is not the ideal solution. In a program which |
| constantly updates clients somehow connected to the server it is not |
| always the best solution to go round robin since some connections might |
| be slow. On the other hand letting the @code{aio_*} function notify the |
| caller might also be not the best solution since whenever the process |
| works on preparing data for on client it makes no sense to be |
| interrupted by a notification since the new client will not be handled |
| before the current client is served. For situations like this |
| @code{aio_suspend} should be used. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
| @c Take aio_requests_mutex, set up waitlist and requestlist, wait |
| @c for completion or timeout, and release the mutex. |
| When calling this function, the calling thread is suspended until at |
| least one of the requests pointed to by the @var{nent} elements of the |
| array @var{list} has completed. If any of the requests has already |
| completed at the time @code{aio_suspend} is called, the function returns |
| immediately. Whether a request has terminated or not is determined by |
| comparing the error status of the request with @code{EINPROGRESS}. If |
| an element of @var{list} is @code{NULL}, the entry is simply ignored. |
| |
| If no request has finished, the calling process is suspended. If |
| @var{timeout} is @code{NULL}, the process is not woken until a request |
| has finished. If @var{timeout} is not @code{NULL}, the process remains |
| suspended at least as long as specified in @var{timeout}. In this case, |
| @code{aio_suspend} returns with an error. |
| |
| The return value of the function is @math{0} if one or more requests |
| from the @var{list} have terminated. Otherwise the function returns |
| @math{-1} and @code{errno} is set to one of the following values: |
| |
| @table @code |
| @item EAGAIN |
| None of the requests from the @var{list} completed in the time specified |
| by @var{timeout}. |
| @item EINTR |
| A signal interrupted the @code{aio_suspend} function. This signal might |
| also be sent by the AIO implementation while signalling the termination |
| of one of the requests. |
| @item ENOSYS |
| The @code{aio_suspend} function is not implemented. |
| @end table |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is in fact @code{aio_suspend64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
| This function is similar to @code{aio_suspend} with the only difference |
| that the argument is a reference to a variable of type @code{struct |
| aiocb64}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this |
| function is available under the name @code{aio_suspend} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| @node Cancel AIO Operations |
| @subsection Cancellation of AIO Operations |
| |
| When one or more requests are asynchronously processed, it might be |
| useful in some situations to cancel a selected operation, e.g., if it |
| becomes obvious that the written data is no longer accurate and would |
| have to be overwritten soon. As an example, assume an application, which |
| writes data in files in a situation where new incoming data would have |
| to be written in a file which will be updated by an enqueued request. |
| The POSIX AIO implementation provides such a function, but this function |
| is not capable of forcing the cancellation of the request. It is up to the |
| implementation to decide whether it is possible to cancel the operation |
| or not. Therefore using this function is merely a hint. |
| |
| @comment aio.h |
| @comment POSIX.1b |
| @deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| @c After fcntl to check the fd is open, hold aio_requests_mutex, call |
| @c aio_find_req_fd, aio_remove_request, then aio_notify and |
| @c aio_free_request each request before releasing the lock. |
| @c aio_notify calls aio_notify_only and free, besides cond signal or |
| @c similar. aio_notify_only calls pthread_attr_init, |
| @c pthread_attr_setdetachstate, malloc, pthread_create, |
| @c notify_func_wrapper, aio_sigqueue, getpid, raise. |
| @c notify_func_wraper calls aio_start_notify_thread, free and then the |
| @c notifier function. |
| The @code{aio_cancel} function can be used to cancel one or more |
| outstanding requests. If the @var{aiocbp} parameter is @code{NULL}, the |
| function tries to cancel all of the outstanding requests which would process |
| the file descriptor @var{fildes} (i.e., whose @code{aio_fildes} member |
| is @var{fildes}). If @var{aiocbp} is not @code{NULL}, @code{aio_cancel} |
| attempts to cancel the specific request pointed to by @var{aiocbp}. |
| |
| For requests which were successfully canceled, the normal notification |
| about the termination of the request should take place. I.e., depending |
| on the @code{struct sigevent} object which controls this, nothing |
| happens, a signal is sent or a thread is started. If the request cannot |
| be canceled, it terminates the usual way after performing the operation. |
| |
| After a request is successfully canceled, a call to @code{aio_error} with |
| a reference to this request as the parameter will return |
| @code{ECANCELED} and a call to @code{aio_return} will return @math{-1}. |
| If the request wasn't canceled and is still running the error status is |
| still @code{EINPROGRESS}. |
| |
| The return value of the function is @code{AIO_CANCELED} if there were |
| requests which haven't terminated and which were successfully canceled. |
| If there is one or more requests left which couldn't be canceled, the |
| return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error} |
| must be used to find out which of the, perhaps multiple, requests (in |
| @var{aiocbp} is @code{NULL}) weren't successfully canceled. If all |
| requests already terminated at the time @code{aio_cancel} is called the |
| return value is @code{AIO_ALLDONE}. |
| |
| If an error occurred during the execution of @code{aio_cancel} the |
| function returns @math{-1} and sets @code{errno} to one of the following |
| values. |
| |
| @table @code |
| @item EBADF |
| The file descriptor @var{fildes} is not valid. |
| @item ENOSYS |
| @code{aio_cancel} is not implemented. |
| @end table |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is in fact @code{aio_cancel64} since the LFS interface |
| transparently replaces the normal implementation. |
| @end deftypefun |
| |
| @comment aio.h |
| @comment Unix98 |
| @deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb64 *@var{aiocbp}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}} |
| This function is similar to @code{aio_cancel} with the only difference |
| that the argument is a reference to a variable of type @code{struct |
| aiocb64}. |
| |
| When the sources are compiled with @code{_FILE_OFFSET_BITS == 64}, this |
| function is available under the name @code{aio_cancel} and so |
| transparently replaces the interface for small files on 32 bit |
| machines. |
| @end deftypefun |
| |
| @node Configuration of AIO |
| @subsection How to optimize the AIO implementation |
| |
| The POSIX standard does not specify how the AIO functions are |
| implemented. They could be system calls, but it is also possible to |
| emulate them at userlevel. |
| |
| At the point of this writing, the available implementation is a userlevel |
| implementation which uses threads for handling the enqueued requests. |
| While this implementation requires making some decisions about |
| limitations, hard limitations are something which is best avoided |
| in @theglibc{}. Therefore, @theglibc{} provides a means |
| for tuning the AIO implementation according to the individual use. |
| |
| @comment aio.h |
| @comment GNU |
| @deftp {Data Type} {struct aioinit} |
| This data type is used to pass the configuration or tunable parameters |
| to the implementation. The program has to initialize the members of |
| this struct and pass it to the implementation using the @code{aio_init} |
| function. |
| |
| @table @code |
| @item int aio_threads |
| This member specifies the maximal number of threads which may be used |
| at any one time. |
| @item int aio_num |
| This number provides an estimate on the maximal number of simultaneously |
| enqueued requests. |
| @item int aio_locks |
| Unused. |
| @item int aio_usedba |
| Unused. |
| @item int aio_debug |
| Unused. |
| @item int aio_numusers |
| Unused. |
| @item int aio_reserved[2] |
| Unused. |
| @end table |
| @end deftp |
| |
| @comment aio.h |
| @comment GNU |
| @deftypefun void aio_init (const struct aioinit *@var{init}) |
| @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{}}@acunsafe{@aculock{}}} |
| @c All changes to global objects are guarded by aio_requests_mutex. |
| This function must be called before any other AIO function. Calling it |
| is completely voluntary, as it is only meant to help the AIO |
| implementation perform better. |
| |
| Before calling the @code{aio_init}, function the members of a variable of |
| type @code{struct aioinit} must be initialized. Then a reference to |
| this variable is passed as the parameter to @code{aio_init} which itself |
| may or may not pay attention to the hints. |
| |
| The function has no return value and no error cases are defined. It is |
| a extension which follows a proposal from the SGI implementation in |
| @w{Irix 6}. It is not covered by POSIX.1b or Unix98. |
| @end deftypefun |
| |
| @node Control Operations |
| @section Control Operations on Files |
| |
| @cindex control operations on files |
| @cindex @code{fcntl} function |
| This section describes how you can perform various other operations on |
| file descriptors, such as inquiring about or setting flags describing |
| the status of the file descriptor, manipulating record locks, and the |
| like. All of these operations are performed by the function @code{fcntl}. |
| |
| The second argument to the @code{fcntl} function is a command that |
| specifies which operation to perform. The function and macros that name |
| various flags that are used with it are declared in the header file |
| @file{fcntl.h}. Many of these flags are also used by the @code{open} |
| function; see @ref{Opening and Closing Files}. |
| @pindex fcntl.h |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| The @code{fcntl} function performs the operation specified by |
| @var{command} on the file descriptor @var{filedes}. Some commands |
| require additional arguments to be supplied. These additional arguments |
| and the return value and error conditions are given in the detailed |
| descriptions of the individual commands. |
| |
| Briefly, here is a list of what the various commands are. |
| |
| @table @code |
| @item F_DUPFD |
| Duplicate the file descriptor (return another file descriptor pointing |
| to the same open file). @xref{Duplicating Descriptors}. |
| |
| @item F_GETFD |
| Get flags associated with the file descriptor. @xref{Descriptor Flags}. |
| |
| @item F_SETFD |
| Set flags associated with the file descriptor. @xref{Descriptor Flags}. |
| |
| @item F_GETFL |
| Get flags associated with the open file. @xref{File Status Flags}. |
| |
| @item F_SETFL |
| Set flags associated with the open file. @xref{File Status Flags}. |
| |
| @item F_GETLK |
| Get a file lock. @xref{File Locks}. |
| |
| @item F_SETLK |
| Set or clear a file lock. @xref{File Locks}. |
| |
| @item F_SETLKW |
| Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. |
| |
| @item F_GETOWN |
| Get process or process group ID to receive @code{SIGIO} signals. |
| @xref{Interrupt Input}. |
| |
| @item F_SETOWN |
| Set process or process group ID to receive @code{SIGIO} signals. |
| @xref{Interrupt Input}. |
| @end table |
| |
| This function is a cancellation point in multi-threaded programs. This |
| is a problem if the thread allocates some resources (like memory, file |
| descriptors, semaphores or whatever) at the time @code{fcntl} is |
| called. If the thread gets canceled these resources stay allocated |
| until the program ends. To avoid this calls to @code{fcntl} should be |
| protected using cancellation handlers. |
| @c ref pthread_cleanup_push / pthread_cleanup_pop |
| @end deftypefun |
| |
| |
| @node Duplicating Descriptors |
| @section Duplicating Descriptors |
| |
| @cindex duplicating file descriptors |
| @cindex redirecting input and output |
| |
| You can @dfn{duplicate} a file descriptor, or allocate another file |
| descriptor that refers to the same open file as the original. Duplicate |
| descriptors share one file position and one set of file status flags |
| (@pxref{File Status Flags}), but each has its own set of file descriptor |
| flags (@pxref{Descriptor Flags}). |
| |
| The major use of duplicating a file descriptor is to implement |
| @dfn{redirection} of input or output: that is, to change the |
| file or pipe that a particular file descriptor corresponds to. |
| |
| You can perform this operation using the @code{fcntl} function with the |
| @code{F_DUPFD} command, but there are also convenient functions |
| @code{dup} and @code{dup2} for duplicating descriptors. |
| |
| @pindex unistd.h |
| @pindex fcntl.h |
| The @code{fcntl} function and flags are declared in @file{fcntl.h}, |
| while prototypes for @code{dup} and @code{dup2} are in the header file |
| @file{unistd.h}. |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun int dup (int @var{old}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function copies descriptor @var{old} to the first available |
| descriptor number (the first number not currently open). It is |
| equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}. |
| @end deftypefun |
| |
| @comment unistd.h |
| @comment POSIX.1 |
| @deftypefun int dup2 (int @var{old}, int @var{new}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| This function copies the descriptor @var{old} to descriptor number |
| @var{new}. |
| |
| If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it |
| does not close @var{new}. Otherwise, the new duplicate of @var{old} |
| replaces any previous meaning of descriptor @var{new}, as if @var{new} |
| were closed first. |
| |
| If @var{old} and @var{new} are different numbers, and @var{old} is a |
| valid descriptor number, then @code{dup2} is equivalent to: |
| |
| @smallexample |
| close (@var{new}); |
| fcntl (@var{old}, F_DUPFD, @var{new}) |
| @end smallexample |
| |
| However, @code{dup2} does this atomically; there is no instant in the |
| middle of calling @code{dup2} at which @var{new} is closed and not yet a |
| duplicate of @var{old}. |
| @end deftypefun |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_DUPFD |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| copy the file descriptor given as the first argument. |
| |
| The form of the call in this case is: |
| |
| @smallexample |
| fcntl (@var{old}, F_DUPFD, @var{next-filedes}) |
| @end smallexample |
| |
| The @var{next-filedes} argument is of type @code{int} and specifies that |
| the file descriptor returned should be the next available one greater |
| than or equal to this value. |
| |
| The return value from @code{fcntl} with this command is normally the value |
| of the new file descriptor. A return value of @math{-1} indicates an |
| error. The following @code{errno} error conditions are defined for |
| this command: |
| |
| @table @code |
| @item EBADF |
| The @var{old} argument is invalid. |
| |
| @item EINVAL |
| The @var{next-filedes} argument is invalid. |
| |
| @item EMFILE |
| There are no more file descriptors available---your program is already |
| using the maximum. In BSD and GNU, the maximum is controlled by a |
| resource limit that can be changed; @pxref{Limits on Resources}, for |
| more information about the @code{RLIMIT_NOFILE} limit. |
| @end table |
| |
| @code{ENFILE} is not a possible error code for @code{dup2} because |
| @code{dup2} does not create a new opening of a file; duplicate |
| descriptors do not count toward the limit which @code{ENFILE} |
| indicates. @code{EMFILE} is possible because it refers to the limit on |
| distinct descriptor numbers in use in one process. |
| @end deftypevr |
| |
| Here is an example showing how to use @code{dup2} to do redirection. |
| Typically, redirection of the standard streams (like @code{stdin}) is |
| done by a shell or shell-like program before calling one of the |
| @code{exec} functions (@pxref{Executing a File}) to execute a new |
| program in a child process. When the new program is executed, it |
| creates and initializes the standard streams to point to the |
| corresponding file descriptors, before its @code{main} function is |
| invoked. |
| |
| So, to redirect standard input to a file, the shell could do something |
| like: |
| |
| @smallexample |
| pid = fork (); |
| if (pid == 0) |
| @{ |
| char *filename; |
| char *program; |
| int file; |
| @dots{} |
| file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY)); |
| dup2 (file, STDIN_FILENO); |
| TEMP_FAILURE_RETRY (close (file)); |
| execv (program, NULL); |
| @} |
| @end smallexample |
| |
| There is also a more detailed example showing how to implement redirection |
| in the context of a pipeline of processes in @ref{Launching Jobs}. |
| |
| |
| @node Descriptor Flags |
| @section File Descriptor Flags |
| @cindex file descriptor flags |
| |
| @dfn{File descriptor flags} are miscellaneous attributes of a file |
| descriptor. These flags are associated with particular file |
| descriptors, so that if you have created duplicate file descriptors |
| from a single opening of a file, each descriptor has its own set of flags. |
| |
| Currently there is just one file descriptor flag: @code{FD_CLOEXEC}, |
| which causes the descriptor to be closed if you use any of the |
| @code{exec@dots{}} functions (@pxref{Executing a File}). |
| |
| The symbols in this section are defined in the header file |
| @file{fcntl.h}. |
| @pindex fcntl.h |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_GETFD |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should return the file descriptor flags associated |
| with the @var{filedes} argument. |
| |
| The normal return value from @code{fcntl} with this command is a |
| nonnegative number which can be interpreted as the bitwise OR of the |
| individual flags (except that currently there is only one flag to use). |
| |
| In case of an error, @code{fcntl} returns @math{-1}. The following |
| @code{errno} error conditions are defined for this command: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is invalid. |
| @end table |
| @end deftypevr |
| |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_SETFD |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should set the file descriptor flags associated with the |
| @var{filedes} argument. This requires a third @code{int} argument to |
| specify the new flags, so the form of the call is: |
| |
| @smallexample |
| fcntl (@var{filedes}, F_SETFD, @var{new-flags}) |
| @end smallexample |
| |
| The normal return value from @code{fcntl} with this command is an |
| unspecified value other than @math{-1}, which indicates an error. |
| The flags and error conditions are the same as for the @code{F_GETFD} |
| command. |
| @end deftypevr |
| |
| The following macro is defined for use as a file descriptor flag with |
| the @code{fcntl} function. The value is an integer constant usable |
| as a bit mask value. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int FD_CLOEXEC |
| @cindex close-on-exec (file descriptor flag) |
| This flag specifies that the file descriptor should be closed when |
| an @code{exec} function is invoked; see @ref{Executing a File}. When |
| a file descriptor is allocated (as with @code{open} or @code{dup}), |
| this bit is initially cleared on the new file descriptor, meaning that |
| descriptor will survive into the new program after @code{exec}. |
| @end deftypevr |
| |
| If you want to modify the file descriptor flags, you should get the |
| current flags with @code{F_GETFD} and modify the value. Don't assume |
| that the flags listed here are the only ones that are implemented; your |
| program may be run years from now and more flags may exist then. For |
| example, here is a function to set or clear the flag @code{FD_CLOEXEC} |
| without altering any other flags: |
| |
| @smallexample |
| /* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,} |
| @r{or clear the flag if @var{value} is 0.} |
| @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
| |
| int |
| set_cloexec_flag (int desc, int value) |
| @{ |
| int oldflags = fcntl (desc, F_GETFD, 0); |
| /* @r{If reading the flags failed, return error indication now.} */ |
| if (oldflags < 0) |
| return oldflags; |
| /* @r{Set just the flag we want to set.} */ |
| if (value != 0) |
| oldflags |= FD_CLOEXEC; |
| else |
| oldflags &= ~FD_CLOEXEC; |
| /* @r{Store modified flag word in the descriptor.} */ |
| return fcntl (desc, F_SETFD, oldflags); |
| @} |
| @end smallexample |
| |
| @node File Status Flags |
| @section File Status Flags |
| @cindex file status flags |
| |
| @dfn{File status flags} are used to specify attributes of the opening of a |
| file. Unlike the file descriptor flags discussed in @ref{Descriptor |
| Flags}, the file status flags are shared by duplicated file descriptors |
| resulting from a single opening of the file. The file status flags are |
| specified with the @var{flags} argument to @code{open}; |
| @pxref{Opening and Closing Files}. |
| |
| File status flags fall into three categories, which are described in the |
| following sections. |
| |
| @itemize @bullet |
| @item |
| @ref{Access Modes}, specify what type of access is allowed to the |
| file: reading, writing, or both. They are set by @code{open} and are |
| returned by @code{fcntl}, but cannot be changed. |
| |
| @item |
| @ref{Open-time Flags}, control details of what @code{open} will do. |
| These flags are not preserved after the @code{open} call. |
| |
| @item |
| @ref{Operating Modes}, affect how operations such as @code{read} and |
| @code{write} are done. They are set by @code{open}, and can be fetched or |
| changed with @code{fcntl}. |
| @end itemize |
| |
| The symbols in this section are defined in the header file |
| @file{fcntl.h}. |
| @pindex fcntl.h |
| |
| @menu |
| * Access Modes:: Whether the descriptor can read or write. |
| * Open-time Flags:: Details of @code{open}. |
| * Operating Modes:: Special modes to control I/O operations. |
| * Getting File Status Flags:: Fetching and changing these flags. |
| @end menu |
| |
| @node Access Modes |
| @subsection File Access Modes |
| |
| The file access modes allow a file descriptor to be used for reading, |
| writing, or both. (On @gnuhurdsystems{}, they can also allow none of these, |
| and allow execution of the file as a program.) The access modes are chosen |
| when the file is opened, and never change. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_RDONLY |
| Open the file for read access. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_WRONLY |
| Open the file for write access. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_RDWR |
| Open the file for both reading and writing. |
| @end deftypevr |
| |
| On @gnuhurdsystems{} (and not on other systems), @code{O_RDONLY} and |
| @code{O_WRONLY} are independent bits that can be bitwise-ORed together, |
| and it is valid for either bit to be set or clear. This means that |
| @code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access |
| mode of zero is permissible; it allows no operations that do input or |
| output to the file, but does allow other operations such as |
| @code{fchmod}. On @gnuhurdsystems{}, since ``read-only'' or ``write-only'' |
| is a misnomer, @file{fcntl.h} defines additional names for the file |
| access modes. These names are preferred when writing GNU-specific code. |
| But most programs will want to be portable to other POSIX.1 systems and |
| should use the POSIX.1 names above instead. |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_READ |
| Open the file for reading. Same as @code{O_RDONLY}; only defined on GNU. |
| @end deftypevr |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_WRITE |
| Open the file for writing. Same as @code{O_WRONLY}; only defined on GNU. |
| @end deftypevr |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_EXEC |
| Open the file for executing. Only defined on GNU. |
| @end deftypevr |
| |
| To determine the file access mode with @code{fcntl}, you must extract |
| the access mode bits from the retrieved file status flags. On |
| @gnuhurdsystems{}, |
| you can just test the @code{O_READ} and @code{O_WRITE} bits in |
| the flags word. But in other POSIX.1 systems, reading and writing |
| access modes are not stored as distinct bit flags. The portable way to |
| extract the file access mode bits is with @code{O_ACCMODE}. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_ACCMODE |
| This macro stands for a mask that can be bitwise-ANDed with the file |
| status flag value to produce a value representing the file access mode. |
| The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}. |
| (On @gnuhurdsystems{} it could also be zero, and it never includes the |
| @code{O_EXEC} bit.) |
| @end deftypevr |
| |
| @node Open-time Flags |
| @subsection Open-time Flags |
| |
| The open-time flags specify options affecting how @code{open} will behave. |
| These options are not preserved once the file is open. The exception to |
| this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it |
| @emph{is} saved. @xref{Opening and Closing Files}, for how to call |
| @code{open}. |
| |
| There are two sorts of options specified by open-time flags. |
| |
| @itemize @bullet |
| @item |
| @dfn{File name translation flags} affect how @code{open} looks up the |
| file name to locate the file, and whether the file can be created. |
| @cindex file name translation flags |
| @cindex flags, file name translation |
| |
| @item |
| @dfn{Open-time action flags} specify extra operations that @code{open} will |
| perform on the file once it is open. |
| @cindex open-time action flags |
| @cindex flags, open-time action |
| @end itemize |
| |
| Here are the file name translation flags. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_CREAT |
| If set, the file will be created if it doesn't already exist. |
| @c !!! mode arg, umask |
| @cindex create on open (file status flag) |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_EXCL |
| If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails |
| if the specified file already exists. This is guaranteed to never |
| clobber an existing file. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_NONBLOCK |
| @cindex non-blocking open |
| This prevents @code{open} from blocking for a ``long time'' to open the |
| file. This is only meaningful for some kinds of files, usually devices |
| such as serial ports; when it is not meaningful, it is harmless and |
| ignored. Often opening a port to a modem blocks until the modem reports |
| carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will |
| return immediately without a carrier. |
| |
| Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating |
| mode and a file name translation flag. This means that specifying |
| @code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode; |
| @pxref{Operating Modes}. To open the file without blocking but do normal |
| I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and |
| then call @code{fcntl} to turn the bit off. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_NOCTTY |
| If the named file is a terminal device, don't make it the controlling |
| terminal for the process. @xref{Job Control}, for information about |
| what it means to be the controlling terminal. |
| |
| On @gnuhurdsystems{} and 4.4 BSD, opening a file never makes it the |
| controlling terminal and @code{O_NOCTTY} is zero. However, @gnulinuxsystems{} |
| and some other systems use a nonzero value for @code{O_NOCTTY} and set the |
| controlling terminal when you open a file that is a terminal device; so |
| to be portable, use @code{O_NOCTTY} when it is important to avoid this. |
| @cindex controlling terminal, setting |
| @end deftypevr |
| |
| The following three file name translation flags exist only on |
| @gnuhurdsystems{}. |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_IGNORE_CTTY |
| Do not recognize the named file as the controlling terminal, even if it |
| refers to the process's existing controlling terminal device. Operations |
| on the new file descriptor will never induce job control signals. |
| @xref{Job Control}. |
| @end deftypevr |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_NOLINK |
| If the named file is a symbolic link, open the link itself instead of |
| the file it refers to. (@code{fstat} on the new file descriptor will |
| return the information returned by @code{lstat} on the link's name.) |
| @cindex symbolic link, opening |
| @end deftypevr |
| |
| @comment fcntl.h (optional) |
| @comment GNU |
| @deftypevr Macro int O_NOTRANS |
| If the named file is specially translated, do not invoke the translator. |
| Open the bare file the translator itself sees. |
| @end deftypevr |
| |
| |
| The open-time action flags tell @code{open} to do additional operations |
| which are not really related to opening the file. The reason to do them |
| as part of @code{open} instead of in separate calls is that @code{open} |
| can do them @i{atomically}. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_TRUNC |
| Truncate the file to zero length. This option is only useful for |
| regular files, not special files such as directories or FIFOs. POSIX.1 |
| requires that you open the file for writing to use @code{O_TRUNC}. In |
| BSD and GNU you must have permission to write the file to truncate it, |
| but you need not open for write access. |
| |
| This is the only open-time action flag specified by POSIX.1. There is |
| no good reason for truncation to be done by @code{open}, instead of by |
| calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in |
| Unix before @code{ftruncate} was invented, and is retained for backward |
| compatibility. |
| @end deftypevr |
| |
| The remaining operating modes are BSD extensions. They exist only |
| on some systems. On other systems, these macros are not defined. |
| |
| @comment fcntl.h (optional) |
| @comment BSD |
| @deftypevr Macro int O_SHLOCK |
| Acquire a shared lock on the file, as with @code{flock}. |
| @xref{File Locks}. |
| |
| If @code{O_CREAT} is specified, the locking is done atomically when |
| creating the file. You are guaranteed that no other process will get |
| the lock on the new file first. |
| @end deftypevr |
| |
| @comment fcntl.h (optional) |
| @comment BSD |
| @deftypevr Macro int O_EXLOCK |
| Acquire an exclusive lock on the file, as with @code{flock}. |
| @xref{File Locks}. This is atomic like @code{O_SHLOCK}. |
| @end deftypevr |
| |
| @node Operating Modes |
| @subsection I/O Operating Modes |
| |
| The operating modes affect how input and output operations using a file |
| descriptor work. These flags are set by @code{open} and can be fetched |
| and changed with @code{fcntl}. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_APPEND |
| The bit that enables append mode for the file. If set, then all |
| @code{write} operations write the data at the end of the file, extending |
| it, regardless of the current file position. This is the only reliable |
| way to append to a file. In append mode, you are guaranteed that the |
| data you write will always go to the current end of the file, regardless |
| of other processes writing to the file. Conversely, if you simply set |
| the file position to the end of file and write, then another process can |
| extend the file after you set the file position but before you write, |
| resulting in your data appearing someplace before the real end of file. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int O_NONBLOCK |
| The bit that enables nonblocking mode for the file. If this bit is set, |
| @code{read} requests on the file can return immediately with a failure |
| status if there is no input immediately available, instead of blocking. |
| Likewise, @code{write} requests can also return immediately with a |
| failure status if the output can't be written immediately. |
| |
| Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O |
| operating mode and a file name translation flag; @pxref{Open-time Flags}. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int O_NDELAY |
| This is an obsolete name for @code{O_NONBLOCK}, provided for |
| compatibility with BSD. It is not defined by the POSIX.1 standard. |
| @end deftypevr |
| |
| The remaining operating modes are BSD and GNU extensions. They exist only |
| on some systems. On other systems, these macros are not defined. |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int O_ASYNC |
| The bit that enables asynchronous input mode. If set, then @code{SIGIO} |
| signals will be generated when input is available. @xref{Interrupt Input}. |
| |
| Asynchronous input mode is a BSD feature. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int O_FSYNC |
| The bit that enables synchronous writing for the file. If set, each |
| @code{write} call will make sure the data is reliably stored on disk before |
| returning. @c !!! xref fsync |
| |
| Synchronous writing is a BSD feature. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int O_SYNC |
| This is another name for @code{O_FSYNC}. They have the same value. |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment GNU |
| @deftypevr Macro int O_NOATIME |
| If this bit is set, @code{read} will not update the access time of the |
| file. @xref{File Times}. This is used by programs that do backups, so |
| that backing a file up does not count as reading it. |
| Only the owner of the file or the superuser may use this bit. |
| |
| This is a GNU extension. |
| @end deftypevr |
| |
| @node Getting File Status Flags |
| @subsection Getting and Setting File Status Flags |
| |
| The @code{fcntl} function can fetch or change file status flags. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_GETFL |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| read the file status flags for the open file with descriptor |
| @var{filedes}. |
| |
| The normal return value from @code{fcntl} with this command is a |
| nonnegative number which can be interpreted as the bitwise OR of the |
| individual flags. Since the file access modes are not single-bit values, |
| you can mask off other bits in the returned flags with @code{O_ACCMODE} |
| to compare them. |
| |
| In case of an error, @code{fcntl} returns @math{-1}. The following |
| @code{errno} error conditions are defined for this command: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is invalid. |
| @end table |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_SETFL |
| This macro is used as the @var{command} argument to @code{fcntl}, to set |
| the file status flags for the open file corresponding to the |
| @var{filedes} argument. This command requires a third @code{int} |
| argument to specify the new flags, so the call looks like this: |
| |
| @smallexample |
| fcntl (@var{filedes}, F_SETFL, @var{new-flags}) |
| @end smallexample |
| |
| You can't change the access mode for the file in this way; that is, |
| whether the file descriptor was opened for reading or writing. |
| |
| The normal return value from @code{fcntl} with this command is an |
| unspecified value other than @math{-1}, which indicates an error. The |
| error conditions are the same as for the @code{F_GETFL} command. |
| @end deftypevr |
| |
| If you want to modify the file status flags, you should get the current |
| flags with @code{F_GETFL} and modify the value. Don't assume that the |
| flags listed here are the only ones that are implemented; your program |
| may be run years from now and more flags may exist then. For example, |
| here is a function to set or clear the flag @code{O_NONBLOCK} without |
| altering any other flags: |
| |
| @smallexample |
| @group |
| /* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,} |
| @r{or clear the flag if @var{value} is 0.} |
| @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
| |
| int |
| set_nonblock_flag (int desc, int value) |
| @{ |
| int oldflags = fcntl (desc, F_GETFL, 0); |
| /* @r{If reading the flags failed, return error indication now.} */ |
| if (oldflags == -1) |
| return -1; |
| /* @r{Set just the flag we want to set.} */ |
| if (value != 0) |
| oldflags |= O_NONBLOCK; |
| else |
| oldflags &= ~O_NONBLOCK; |
| /* @r{Store modified flag word in the descriptor.} */ |
| return fcntl (desc, F_SETFL, oldflags); |
| @} |
| @end group |
| @end smallexample |
| |
| @node File Locks |
| @section File Locks |
| |
| @cindex file locks |
| @cindex record locking |
| The remaining @code{fcntl} commands are used to support @dfn{record |
| locking}, which permits multiple cooperating programs to prevent each |
| other from simultaneously accessing parts of a file in error-prone |
| ways. |
| |
| @cindex exclusive lock |
| @cindex write lock |
| An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access |
| for writing to the specified part of the file. While a write lock is in |
| place, no other process can lock that part of the file. |
| |
| @cindex shared lock |
| @cindex read lock |
| A @dfn{shared} or @dfn{read} lock prohibits any other process from |
| requesting a write lock on the specified part of the file. However, |
| other processes can request read locks. |
| |
| The @code{read} and @code{write} functions do not actually check to see |
| whether there are any locks in place. If you want to implement a |
| locking protocol for a file shared by multiple processes, your application |
| must do explicit @code{fcntl} calls to request and clear locks at the |
| appropriate points. |
| |
| Locks are associated with processes. A process can only have one kind |
| of lock set for each byte of a given file. When any file descriptor for |
| that file is closed by the process, all of the locks that process holds |
| on that file are released, even if the locks were made using other |
| descriptors that remain open. Likewise, locks are released when a |
| process exits, and are not inherited by child processes created using |
| @code{fork} (@pxref{Creating a Process}). |
| |
| When making a lock, use a @code{struct flock} to specify what kind of |
| lock and where. This data type and the associated macros for the |
| @code{fcntl} function are declared in the header file @file{fcntl.h}. |
| @pindex fcntl.h |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftp {Data Type} {struct flock} |
| This structure is used with the @code{fcntl} function to describe a file |
| lock. It has these members: |
| |
| @table @code |
| @item short int l_type |
| Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or |
| @code{F_UNLCK}. |
| |
| @item short int l_whence |
| This corresponds to the @var{whence} argument to @code{fseek} or |
| @code{lseek}, and specifies what the offset is relative to. Its value |
| can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}. |
| |
| @item off_t l_start |
| This specifies the offset of the start of the region to which the lock |
| applies, and is given in bytes relative to the point specified by |
| @code{l_whence} member. |
| |
| @item off_t l_len |
| This specifies the length of the region to be locked. A value of |
| @code{0} is treated specially; it means the region extends to the end of |
| the file. |
| |
| @item pid_t l_pid |
| This field is the process ID (@pxref{Process Creation Concepts}) of the |
| process holding the lock. It is filled in by calling @code{fcntl} with |
| the @code{F_GETLK} command, but is ignored when making a lock. |
| @end table |
| @end deftp |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_GETLK |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should get information about a lock. This command |
| requires a third argument of type @w{@code{struct flock *}} to be passed |
| to @code{fcntl}, so that the form of the call is: |
| |
| @smallexample |
| fcntl (@var{filedes}, F_GETLK, @var{lockp}) |
| @end smallexample |
| |
| If there is a lock already in place that would block the lock described |
| by the @var{lockp} argument, information about that lock overwrites |
| @code{*@var{lockp}}. Existing locks are not reported if they are |
| compatible with making a new lock as specified. Thus, you should |
| specify a lock type of @code{F_WRLCK} if you want to find out about both |
| read and write locks, or @code{F_RDLCK} if you want to find out about |
| write locks only. |
| |
| There might be more than one lock affecting the region specified by the |
| @var{lockp} argument, but @code{fcntl} only returns information about |
| one of them. The @code{l_whence} member of the @var{lockp} structure is |
| set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields |
| set to identify the locked region. |
| |
| If no lock applies, the only change to the @var{lockp} structure is to |
| update the @code{l_type} to a value of @code{F_UNLCK}. |
| |
| The normal return value from @code{fcntl} with this command is an |
| unspecified value other than @math{-1}, which is reserved to indicate an |
| error. The following @code{errno} error conditions are defined for |
| this command: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is invalid. |
| |
| @item EINVAL |
| Either the @var{lockp} argument doesn't specify valid lock information, |
| or the file associated with @var{filedes} doesn't support locks. |
| @end table |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_SETLK |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should set or clear a lock. This command requires a |
| third argument of type @w{@code{struct flock *}} to be passed to |
| @code{fcntl}, so that the form of the call is: |
| |
| @smallexample |
| fcntl (@var{filedes}, F_SETLK, @var{lockp}) |
| @end smallexample |
| |
| If the process already has a lock on any part of the region, the old lock |
| on that part is replaced with the new lock. You can remove a lock |
| by specifying a lock type of @code{F_UNLCK}. |
| |
| If the lock cannot be set, @code{fcntl} returns immediately with a value |
| of @math{-1}. This function does not block waiting for other processes |
| to release locks. If @code{fcntl} succeeds, it return a value other |
| than @math{-1}. |
| |
| The following @code{errno} error conditions are defined for this |
| function: |
| |
| @table @code |
| @item EAGAIN |
| @itemx EACCES |
| The lock cannot be set because it is blocked by an existing lock on the |
| file. Some systems use @code{EAGAIN} in this case, and other systems |
| use @code{EACCES}; your program should treat them alike, after |
| @code{F_SETLK}. (@gnulinuxhurdsystems{} always use @code{EAGAIN}.) |
| |
| @item EBADF |
| Either: the @var{filedes} argument is invalid; you requested a read lock |
| but the @var{filedes} is not open for read access; or, you requested a |
| write lock but the @var{filedes} is not open for write access. |
| |
| @item EINVAL |
| Either the @var{lockp} argument doesn't specify valid lock information, |
| or the file associated with @var{filedes} doesn't support locks. |
| |
| @item ENOLCK |
| The system has run out of file lock resources; there are already too |
| many file locks in place. |
| |
| Well-designed file systems never report this error, because they have no |
| limitation on the number of locks. However, you must still take account |
| of the possibility of this error, as it could result from network access |
| to a file system on another machine. |
| @end table |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @deftypevr Macro int F_SETLKW |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should set or clear a lock. It is just like the |
| @code{F_SETLK} command, but causes the process to block (or wait) |
| until the request can be specified. |
| |
| This command requires a third argument of type @code{struct flock *}, as |
| for the @code{F_SETLK} command. |
| |
| The @code{fcntl} return values and errors are the same as for the |
| @code{F_SETLK} command, but these additional @code{errno} error conditions |
| are defined for this command: |
| |
| @table @code |
| @item EINTR |
| The function was interrupted by a signal while it was waiting. |
| @xref{Interrupted Primitives}. |
| |
| @item EDEADLK |
| The specified region is being locked by another process. But that |
| process is waiting to lock a region which the current process has |
| locked, so waiting for the lock would result in deadlock. The system |
| does not guarantee that it will detect all such conditions, but it lets |
| you know if it notices one. |
| @end table |
| @end deftypevr |
| |
| |
| The following macros are defined for use as values for the @code{l_type} |
| member of the @code{flock} structure. The values are integer constants. |
| |
| @table @code |
| @comment fcntl.h |
| @comment POSIX.1 |
| @vindex F_RDLCK |
| @item F_RDLCK |
| This macro is used to specify a read (or shared) lock. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @vindex F_WRLCK |
| @item F_WRLCK |
| This macro is used to specify a write (or exclusive) lock. |
| |
| @comment fcntl.h |
| @comment POSIX.1 |
| @vindex F_UNLCK |
| @item F_UNLCK |
| This macro is used to specify that the region is unlocked. |
| @end table |
| |
| As an example of a situation where file locking is useful, consider a |
| program that can be run simultaneously by several different users, that |
| logs status information to a common file. One example of such a program |
| might be a game that uses a file to keep track of high scores. Another |
| example might be a program that records usage or accounting information |
| for billing purposes. |
| |
| Having multiple copies of the program simultaneously writing to the |
| file could cause the contents of the file to become mixed up. But |
| you can prevent this kind of problem by setting a write lock on the |
| file before actually writing to the file. |
| |
| If the program also needs to read the file and wants to make sure that |
| the contents of the file are in a consistent state, then it can also use |
| a read lock. While the read lock is set, no other process can lock |
| that part of the file for writing. |
| |
| @c ??? This section could use an example program. |
| |
| Remember that file locks are only a @emph{voluntary} protocol for |
| controlling access to a file. There is still potential for access to |
| the file by programs that don't use the lock protocol. |
| |
| @node Interrupt Input |
| @section Interrupt-Driven Input |
| |
| @cindex interrupt-driven input |
| If you set the @code{O_ASYNC} status flag on a file descriptor |
| (@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever |
| input or output becomes possible on that file descriptor. The process |
| or process group to receive the signal can be selected by using the |
| @code{F_SETOWN} command to the @code{fcntl} function. If the file |
| descriptor is a socket, this also selects the recipient of @code{SIGURG} |
| signals that are delivered when out-of-band data arrives on that socket; |
| see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation |
| where @code{select} would report the socket as having an ``exceptional |
| condition''. @xref{Waiting for I/O}.) |
| |
| If the file descriptor corresponds to a terminal device, then @code{SIGIO} |
| signals are sent to the foreground process group of the terminal. |
| @xref{Job Control}. |
| |
| @pindex fcntl.h |
| The symbols in this section are defined in the header file |
| @file{fcntl.h}. |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int F_GETOWN |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should get information about the process or process |
| group to which @code{SIGIO} signals are sent. (For a terminal, this is |
| actually the foreground process group ID, which you can get using |
| @code{tcgetpgrp}; see @ref{Terminal Access Functions}.) |
| |
| The return value is interpreted as a process ID; if negative, its |
| absolute value is the process group ID. |
| |
| The following @code{errno} error condition is defined for this command: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is invalid. |
| @end table |
| @end deftypevr |
| |
| @comment fcntl.h |
| @comment BSD |
| @deftypevr Macro int F_SETOWN |
| This macro is used as the @var{command} argument to @code{fcntl}, to |
| specify that it should set the process or process group to which |
| @code{SIGIO} signals are sent. This command requires a third argument |
| of type @code{pid_t} to be passed to @code{fcntl}, so that the form of |
| the call is: |
| |
| @smallexample |
| fcntl (@var{filedes}, F_SETOWN, @var{pid}) |
| @end smallexample |
| |
| The @var{pid} argument should be a process ID. You can also pass a |
| negative number whose absolute value is a process group ID. |
| |
| The return value from @code{fcntl} with this command is @math{-1} |
| in case of error and some other value if successful. The following |
| @code{errno} error conditions are defined for this command: |
| |
| @table @code |
| @item EBADF |
| The @var{filedes} argument is invalid. |
| |
| @item ESRCH |
| There is no process or process group corresponding to @var{pid}. |
| @end table |
| @end deftypevr |
| |
| @c ??? This section could use an example program. |
| |
| @node IOCTLs |
| @section Generic I/O Control operations |
| @cindex generic i/o control operations |
| @cindex IOCTLs |
| |
| @gnusystems{} can handle most input/output operations on many different |
| devices and objects in terms of a few file primitives - @code{read}, |
| @code{write} and @code{lseek}. However, most devices also have a few |
| peculiar operations which do not fit into this model. Such as: |
| |
| @itemize @bullet |
| |
| @item |
| Changing the character font used on a terminal. |
| |
| @item |
| Telling a magnetic tape system to rewind or fast forward. (Since they |
| cannot move in byte increments, @code{lseek} is inapplicable). |
| |
| @item |
| Ejecting a disk from a drive. |
| |
| @item |
| Playing an audio track from a CD-ROM drive. |
| |
| @item |
| Maintaining routing tables for a network. |
| |
| @end itemize |
| |
| Although some such objects such as sockets and terminals |
| @footnote{Actually, the terminal-specific functions are implemented with |
| IOCTLs on many platforms.} have special functions of their own, it would |
| not be practical to create functions for all these cases. |
| |
| Instead these minor operations, known as @dfn{IOCTL}s, are assigned code |
| numbers and multiplexed through the @code{ioctl} function, defined in |
| @code{sys/ioctl.h}. The code numbers themselves are defined in many |
| different headers. |
| |
| @comment sys/ioctl.h |
| @comment BSD |
| @deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{}) |
| @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} |
| |
| The @code{ioctl} function performs the generic I/O operation |
| @var{command} on @var{filedes}. |
| |
| A third argument is usually present, either a single number or a pointer |
| to a structure. The meaning of this argument, the returned value, and |
| any error codes depends upon the command used. Often @math{-1} is |
| returned for a failure. |
| |
| @end deftypefun |
| |
| On some systems, IOCTLs used by different devices share the same numbers. |
| Thus, although use of an inappropriate IOCTL @emph{usually} only produces |
| an error, you should not attempt to use device-specific IOCTLs on an |
| unknown device. |
| |
| Most IOCTLs are OS-specific and/or only used in special system utilities, |
| and are thus beyond the scope of this document. For an example of the use |
| of an IOCTL, see @ref{Out-of-Band Data}. |
| |
| @c FIXME this is undocumented: |
| @c dup3 |