|  | type=page | 
|  | status=published | 
|  | title=High Availability in {productName} | 
|  | next=ssh-setup.html | 
|  | prev=preface.html | 
|  | ~~~~~~ | 
|  |  | 
|  | = High Availability in {productName} | 
|  |  | 
|  | [[GSHAG00002]][[abdaq]] | 
|  |  | 
|  |  | 
|  | [[high-availability-in-glassfish-server]] | 
|  | == 1 High Availability in {productName} | 
|  |  | 
|  | This chapter describes the high availability features in {productName} 7. | 
|  |  | 
|  | The following topics are addressed here: | 
|  |  | 
|  | * link:#abdar[Overview of High Availability] | 
|  | * link:#gaymr[How {productName} Provides High Availability] | 
|  | * link:#gbcot[Recovering from Failures] | 
|  | * link:#abdaz[More Information] | 
|  |  | 
|  | [[abdar]][[GSHAG00168]][[overview-of-high-availability]] | 
|  |  | 
|  | === Overview of High Availability | 
|  |  | 
|  | High availability applications and services provide their functionality | 
|  | continuously, regardless of hardware and software failures. To make such | 
|  | reliability possible, {productName} provides mechanisms for | 
|  | maintaining application state data between clustered {productName} | 
|  | instances. Application state data, such as HTTP session data, stateful | 
|  | EJB sessions, and dynamic cache information, is replicated in real time | 
|  | across server instances. If any one server instance goes down, the | 
|  | session state is available to the next failover server, resulting in | 
|  | minimum application downtime and enhanced transactional security. | 
|  |  | 
|  | {productName} provides the following high availability features: | 
|  |  | 
|  | * link:#gksdm[Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module] | 
|  | * link:#gaynn[High Availability Session Persistence] | 
|  | * link:#gayna[High Availability Java Message Service] | 
|  | * link:#gaymz[RMI-IIOP Load Balancing and Failover] | 
|  |  | 
|  | [[gksdm]][[GSHAG00252]][[load-balancing-with-the-apache-mod_jk-or-mod_proxy_ajp-module]] | 
|  |  | 
|  | ==== Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module | 
|  |  | 
|  | A common load balancing configuration for {productName} 7 is to use | 
|  | the Apache HTTP Server as the web server front-end, and the Apache | 
|  | `mod_jk` or `mod_proxy_ajp` module as the connector between the web | 
|  | server and {productName}. See | 
|  | link:http-load-balancing.html#gksdt[Configuring {productName} with | 
|  | Apache HTTP Server and `mod_jk`] and | 
|  | link:http-load-balancing.html#CHDCCGDC[Configuring {productName} with | 
|  | Apache HTTP Server and `mod_proxy_ajp`] for more information. | 
|  |  | 
|  | [[gaynn]][[GSHAG00253]][[high-availability-session-persistence]] | 
|  |  | 
|  | ==== High Availability Session Persistence | 
|  |  | 
|  | {productName} provides high availability of HTTP requests and session | 
|  | data (both HTTP session data and stateful session bean data). | 
|  |  | 
|  | Jakarta EE applications typically have significant amounts of session state | 
|  | data. A web shopping cart is the classic example of a session state. | 
|  | Also, an application can cache frequently-needed data in the session | 
|  | object. In fact, almost all applications with significant user | 
|  | interactions need to maintain session state. Both HTTP sessions and | 
|  | stateful session beans (SFSBs) have session state data. | 
|  |  | 
|  | Preserving session state across server failures can be important to end | 
|  | users. If the {productName} instance hosting the user session | 
|  | experiences a failure, the session state can be recovered, and the | 
|  | session can continue without loss of information. High availability is | 
|  | implemented in {productName} by means of in-memory session | 
|  | replication on {productName} instances running in a cluster. | 
|  |  | 
|  | For more information about in-memory session replication in {productName}, see link:#gaymr[How {productName} Provides High | 
|  | Availability]. For detailed instructions on configuring high | 
|  | availability session persistence, see | 
|  | link:session-persistence-and-failover.html#abdkz[Configuring High | 
|  | Availability Session Persistence and Failover]. | 
|  |  | 
|  | [[gayna]][[GSHAG00254]][[high-availability-java-message-service]] | 
|  |  | 
|  | ==== High Availability Java Message Service | 
|  |  | 
|  | {productName} supports the Java Message Service (JMS) API and JMS | 
|  | messaging through its built-in jmsra resource adapter communicating with | 
|  | Open Message Queue as the JMS provider. This combination is often called | 
|  | the JMS Service. | 
|  |  | 
|  | The JMS service makes JMS messaging highly available as follows: | 
|  |  | 
|  | Message Queue Broker Clusters:: | 
|  | By default, when a GlassFish cluster is created, the JMS service | 
|  | automatically configures a Message Queue broker cluster to provide JMS | 
|  | messaging services, with one clustered broker assigned to each cluster | 
|  | instance. This automatically created broker cluster is configurable to | 
|  | take advantage of the two types of broker clusters, conventional and | 
|  | enhanced, supported by Message Queue. + | 
|  | Additionally, Message Queue broker clusters created and managed using | 
|  | Message Queue itself can be used as external, or remote, JMS hosts. | 
|  | Using external broker clusters provides additional deployment options, | 
|  | such as deploying Message Queue brokers on different hosts from the | 
|  | GlassFish instances they service, or deploying different numbers of | 
|  | Message Queue brokers and GlassFish instances. + | 
|  | For more information about Message Queue clustering, see | 
|  | link:jms.html#abdbx[Using Message Queue Broker Clusters With {productName}]. | 
|  | Connection Failover:: | 
|  | The use of Message Queue broker clusters allows connection failover in | 
|  | the event of a broker failure. If the primary JMS host (Message Queue | 
|  | broker) in use by a GlassFish instance fails, connections to the | 
|  | failed JMS host will automatically fail over to another host in the | 
|  | JMS host list, allowing messaging operations to continue and | 
|  | maintaining JMS messaging semantics. + | 
|  | For more information about JMS connection failover, see | 
|  | link:jms.html#abdbv[Connection Failover]. | 
|  |  | 
|  | [[gaymz]][[GSHAG00255]][[rmi-iiop-load-balancing-and-failover]] | 
|  |  | 
|  | ==== RMI-IIOP Load Balancing and Failover | 
|  |  | 
|  | With RMI-IIOP load balancing, IIOP client requests are distributed to | 
|  | different server instances or name servers, which spreads the load | 
|  | evenly across the cluster, providing scalability. IIOP load balancing | 
|  | combined with EJB clustering and availability also provides EJB | 
|  | failover. | 
|  |  | 
|  | When a client performs a JNDI lookup for an object, the Naming Service | 
|  | essentially binds the request to a particular server instance. From then | 
|  | on, all lookup requests made from that client are sent to the same | 
|  | server instance, and thus all `EJBHome` objects will be hosted on the | 
|  | same target server. Any bean references obtained henceforth are also | 
|  | created on the same target host. This effectively provides load | 
|  | balancing, since all clients randomize the list of target servers when | 
|  | performing JNDI lookups. If the target server instance goes down, the | 
|  | lookup or EJB method invocation will failover to another server | 
|  | instance. | 
|  |  | 
|  | IIOP Load balancing and failover happens transparently. No special steps | 
|  | are needed during application deployment. If the {productName} | 
|  | instance on which the application client is deployed participates in a | 
|  | cluster, the {productName} finds all currently active IIOP endpoints | 
|  | in the cluster automatically. However, a client should have at least two | 
|  | endpoints specified for bootstrapping purposes, in case one of the | 
|  | endpoints has failed. | 
|  |  | 
|  | For more information on RMI-IIOP load balancing and failover, see | 
|  | link:rmi-iiop.html#fxxqs[RMI-IIOP Load Balancing and Failover]. | 
|  |  | 
|  | [[gaymr]][[GSHAG00169]][[how-glassfish-server-provides-high-availability]] | 
|  |  | 
|  | === How {productName} Provides High Availability | 
|  |  | 
|  | {productName} provides high availability through the following | 
|  | subcomponents and features: | 
|  |  | 
|  | * link:#gjghv[Storage for Session State Data] | 
|  | * link:#abdax[Highly Available Clusters] | 
|  |  | 
|  | [[gjghv]][[GSHAG00256]][[storage-for-session-state-data]] | 
|  |  | 
|  | ==== Storage for Session State Data | 
|  |  | 
|  | Storing session state data enables the session state to be recovered | 
|  | after the failover of a server instance in a cluster. Recovering the | 
|  | session state enables the session to continue without loss of | 
|  | information. {productName} supports in-memory session replication on | 
|  | other servers in the cluster for maintaining HTTP session and stateful | 
|  | session bean data. | 
|  |  | 
|  | In-memory session replication is implemented in {productName} 7 as | 
|  | an OSGi module. Internally, the replication module uses a consistent | 
|  | hash algorithm to pick a replica server instance within a cluster of | 
|  | instances. This allows the replication module to easily locate the | 
|  | replica or replicated data when a container needs to retrieve the data. | 
|  |  | 
|  | The use of in-memory replication requires the Group Management Service | 
|  | (GMS) to be enabled. For more information about GMS, see | 
|  | link:clusters.html#gjfnl[Group Management Service]. | 
|  |  | 
|  | If server instances in a cluster are located on different hosts, ensure | 
|  | that the following prerequisites are met: | 
|  |  | 
|  | * To ensure that GMS and in-memory replication function correctly, the | 
|  | hosts must be on the same subnet. | 
|  | * To ensure that in-memory replication functions correctly, the system | 
|  | clocks on all hosts in the cluster must be synchronized as closely as | 
|  | possible. | 
|  |  | 
|  | [[abdax]][[GSHAG00257]][[highly-available-clusters]] | 
|  |  | 
|  | ==== Highly Available Clusters | 
|  |  | 
|  | A highly available cluster integrates a state replication service with | 
|  | clusters and load balancer. | 
|  |  | 
|  |  | 
|  | [NOTE] | 
|  | ==== | 
|  | When implementing a highly available cluster, use a load balancer that | 
|  | includes session-based stickiness as part of its load-balancing | 
|  | algorithm. Otherwise, session data can be misdirected or lost. | 
|  | An example of a load balancer that includes session-based stickiness is the | 
|  | Loadbalancer Plug-In available in {productName}. | 
|  | ==== | 
|  |  | 
|  |  | 
|  | [[abday]][[GSHAG00218]][[clusters-instances-sessions-and-load-balancing]] | 
|  |  | 
|  | ===== Clusters, Instances, Sessions, and Load Balancing | 
|  |  | 
|  | Clusters, server instances, load balancers, and sessions are related as | 
|  | follows: | 
|  |  | 
|  | * A server instance is not required to be part of a cluster. However, an | 
|  | instance that is not part of a cluster cannot take advantage of high | 
|  | availability through transfer of session state from one instance to | 
|  | other instances. | 
|  | * The server instances within a cluster can be hosted on one or multiple | 
|  | hosts. You can group server instances across different hosts into a | 
|  | cluster. | 
|  | * A particular load balancer can forward requests to server instances on | 
|  | multiple clusters. You can use this ability of the load balancer to | 
|  | perform an online upgrade without loss of service. For more information, | 
|  | see link:rolling-upgrade.html#abdin[Upgrading in Multiple Clusters]. | 
|  | * A single cluster can receive requests from multiple load balancers. If | 
|  | a cluster is served by more than one load balancer, you must configure | 
|  | the cluster in exactly the same way on each load balancer. | 
|  | * Each session is tied to a particular cluster. Therefore, although you | 
|  | can deploy an application on multiple clusters, session failover will | 
|  | occur only within a single cluster. | 
|  |  | 
|  | The cluster thus acts as a safe boundary for session failover for the | 
|  | server instances within the cluster. You can use the load balancer and | 
|  | upgrade components within the {productName} without loss of service. | 
|  |  | 
|  | [[gktax]][[GSHAG00219]][[protocols-for-centralized-cluster-administration]] | 
|  |  | 
|  | ===== Protocols for Centralized Cluster Administration | 
|  |  | 
|  | {productName} uses the Distributed Component Object Model (DCOM) | 
|  | remote protocol or secure shell (SSH) to ensure that clusters that span | 
|  | multiple hosts can be administered centrally. To perform administrative | 
|  | operations on {productName} instances that are remote from the domain | 
|  | administration server (DAS), the DAS must be able to communicate with | 
|  | those instances. If an instance is running, the DAS connects to the | 
|  | running instance directly. For example, when you deploy an application | 
|  | to an instance, the DAS connects to the instance and deploys the | 
|  | application to the instance. | 
|  |  | 
|  | However, the DAS cannot connect to an instance to perform operations on | 
|  | an instance that is not running, such as creating or starting the | 
|  | instance. For these operations, the DAS uses DCOM or SSH to contact a | 
|  | remote host and administer instances there. DCOM or SSH provides | 
|  | confidentiality and security for data that is exchanged between the DAS | 
|  | and remote hosts. | 
|  |  | 
|  |  | 
|  | [NOTE] | 
|  | ==== | 
|  | The use of DCOM or SSH to enable centralized administration of remote | 
|  | instances is optional. If the use of DCOM SSH is not feasible in your | 
|  | environment, you can administer remote instances locally. | 
|  | ==== | 
|  |  | 
|  |  | 
|  | For more information, see link:ssh-setup.html#gkshg[Enabling Centralized | 
|  | Administration of {productName} Instances]. | 
|  |  | 
|  | [[gbcot]][[GSHAG00170]][[recovering-from-failures]] | 
|  |  | 
|  | === Recovering from Failures | 
|  |  | 
|  | You can use various techniques to manually recover individual | 
|  | subcomponents after hardware failures such as disk crashes. | 
|  |  | 
|  | The following topics are addressed here: | 
|  |  | 
|  | * link:#gcmkp[Recovering the Domain Administration Server] | 
|  | * link:#gcmkc[Recovering {productName} Instances] | 
|  | * link:#gcmjs[Recovering the HTTP Load Balancer and Web Server] | 
|  | * link:#gcmjr[Recovering Message Queue] | 
|  |  | 
|  | [[gcmkp]][[GSHAG00258]][[recovering-the-domain-administration-server]] | 
|  |  | 
|  | ==== Recovering the Domain Administration Server | 
|  |  | 
|  | Loss of the Domain Administration Server (DAS) affects only | 
|  | administration. {productName} clusters and standalone instances, and | 
|  | the applications deployed to them, continue to run as before, even if | 
|  | the DAS is not reachable | 
|  |  | 
|  | Use any of the following methods to recover the DAS: | 
|  |  | 
|  | * Back up the domain periodically, so you have periodic snapshots. After | 
|  | a hardware failure, re-create the DAS on a new host, as described in | 
|  | "link:administration-guide/domains.html#GSADG00542[Re-Creating the Domain Administration Server (DAS)]" | 
|  | in {productName} Administration Guide. | 
|  | * Put the domain installation and configuration on a shared and robust | 
|  | file system (NFS for example). If the primary DAS host fails, a second | 
|  | host is brought up with the same IP address and will take over with | 
|  | manual intervention or user supplied automation. | 
|  | * Zip the {productName} installation and domain root directory. | 
|  | Restore it on the new host, assigning it the same network identity. | 
|  |  | 
|  | [[gcmkc]][[GSHAG00259]][[recovering-glassfish-server-instances]] | 
|  |  | 
|  | ==== Recovering {productName} Instances | 
|  |  | 
|  | {productName} provide tools for backing up and restoring {productName} instances. For more information, see link:instances.html#gksdy[To | 
|  | Resynchronize an Instance and the DAS Offline]. | 
|  |  | 
|  | [[gcmjs]][[GSHAG00260]][[recovering-the-http-load-balancer-and-web-server]] | 
|  |  | 
|  | ==== Recovering the HTTP Load Balancer and Web Server | 
|  |  | 
|  | There are no explicit commands to back up only a web server | 
|  | configuration. Simply zip the web server installation directory. After | 
|  | failure, unzip the saved backup on a new host with the same network | 
|  | identity. If the new host has a different IP address, update the DNS | 
|  | server or the routers. | 
|  |  | 
|  |  | 
|  | [NOTE] | 
|  | ==== | 
|  | This assumes that the web server is either reinstalled or restored from | 
|  | an image first. | 
|  | ==== | 
|  |  | 
|  |  | 
|  | The Load Balancer Plug-In (`plugins` directory) and configurations are | 
|  | in the web server installation directory, typically `/opt/SUNWwbsvr`. | 
|  | The web-install``/``web-instance``/config`` directory contains the | 
|  | `loadbalancer.xml` file. | 
|  |  | 
|  | [[gcmjr]][[GSHAG00261]][[recovering-message-queue]] | 
|  |  | 
|  | ==== Recovering Message Queue | 
|  |  | 
|  | When a Message Queue broker becomes unavailable, the method you use to | 
|  | restore the broker to operation depends on the nature of the failure | 
|  | that caused the broker to become unavailable: | 
|  |  | 
|  | * Power failure or failure other than disk storage | 
|  | * Failure of disk storage | 
|  |  | 
|  | Additionally, the urgency of restoring an unavailable broker to | 
|  | operation depends on the type of the broker: | 
|  |  | 
|  | * Standalone Broker. When a standalone broker becomes unavailable, both | 
|  | service availability and data availability are interrupted. Restore the | 
|  | broker to operation as soon as possible to restore availability. | 
|  | * Broker in a Conventional Cluster. When a broker in a conventional | 
|  | cluster becomes unavailable, service availability continues to be | 
|  | provided by the other brokers in the cluster. However, data availability | 
|  | of the persistent data stored by the unavailable broker is interrupted. | 
|  | Restore the broker to operation to restore availability of its | 
|  | persistent data. | 
|  | * Broker in an Enhanced Cluster. When a broker in an enhanced cluster | 
|  | becomes unavailable, service availability and data availability continue | 
|  | to be provided by the other brokers in the cluster. Restore the broker | 
|  | to operation to return the cluster to its previous capacity. | 
|  |  | 
|  | [[glaiv]][[GSHAG00220]][[recovering-from-power-failure-and-failures-other-than-disk-storage]] | 
|  |  | 
|  | ===== Recovering From Power Failure and Failures Other Than Disk Storage | 
|  |  | 
|  | When a host is affected by a power failure or failure of a non-disk | 
|  | component such as memory, processor or network card, restore Message | 
|  | Queue brokers on the affected host by starting the brokers after the | 
|  | failure has been remedied. | 
|  |  | 
|  | To start brokers serving as Embedded or Local JMS hosts, start the | 
|  | GlassFish instances the brokers are servicing. To start brokers serving | 
|  | as Remote JMS hosts, use the `imqbrokerd` Message Queue utility. | 
|  |  | 
|  | [[glaiu]][[GSHAG00221]][[recovering-from-failure-of-disk-storage]] | 
|  |  | 
|  | ===== Recovering from Failure of Disk Storage | 
|  |  | 
|  | Message Queue uses disk storage for software, configuration files and | 
|  | persistent data stores. In a default GlassFish installation, all three | 
|  | of these are generally stored on the same disk: the Message Queue | 
|  | software in as-install-parent``/mq``, and broker configuration files and | 
|  | persistent data stores (except for the persistent data stores of | 
|  | enhanced clusters, which are housed in highly available databases) in | 
|  | domain-dir``/imq``. If this disk fails, restoring brokers to operation is | 
|  | impossible unless you have previously created a backup of these items. | 
|  | To create such a backup, use a utility such as `zip`, `gzip` or `tar` to | 
|  | create archives of these directories and all their content. When | 
|  | creating the backup, you should first quiesce all brokers and physical | 
|  | destinations, as described in "link:../openmq/mq-admin-guide/broker-management.html#GMADG00522[Quiescing a Broker]" and | 
|  | "link:../openmq/mq-admin-guide/message-delivery.html#GMADG00533[Pausing and Resuming a Physical Destination]" in Open | 
|  | Message Queue Administration Guide, respectively. Then, after the failed | 
|  | disk is replaced and put into service, expand the backup archive into | 
|  | the same location. | 
|  |  | 
|  | Restoring the Persistent Data Store From Backup. For many messaging | 
|  | applications, restoring a persistent data store from backup does not | 
|  | produce the desired results because the backed up store does not | 
|  | represent the content of the store when the disk failure occurred. In | 
|  | some applications, the persistent data changes rapidly enough to make | 
|  | backups obsolete as soon as they are created. To avoid issues in | 
|  | restoring a persistent data store, consider using a RAID or SAN data | 
|  | storage solution that is fault tolerant, especially for data stores in | 
|  | production environments. | 
|  |  | 
|  | [[abdaz]][[GSHAG00171]][[more-information]] | 
|  |  | 
|  | === More Information | 
|  |  | 
|  | For information about planning a high-availability deployment, including | 
|  | assessing hardware requirements, planning network configuration, and | 
|  | selecting a topology, see the link:deployment-planning-guide.html#GSPLG[{productName} Deployment Planning Guide]. This manual also provides a | 
|  | high-level introduction to concepts such as: | 
|  |  | 
|  | * {productName} components such as node agents, domains, and clusters | 
|  | * IIOP load balancing in a cluster | 
|  | * Message queue failover | 
|  |  | 
|  | For more information about developing applications that take advantage | 
|  | of high availability features, see the link:application-development-guide.html#GSDVG[{productName} Application Development Guide]. | 
|  |  | 
|  | For information on how to configure and tune applications and {productName} for best performance with high availability, see the | 
|  | link:performance-tuning-guide.html#GSPTG[{productName} Performance Tuning | 
|  | Guide], which discusses topics such as: | 
|  |  | 
|  | * Tuning persistence frequency and persistence scope | 
|  | * Checkpointing stateful session beans | 
|  | * Configuring the JDBC connection pool | 
|  | * Session size | 
|  | * Configuring load balancers for best performance |