| type=page |
| status=published |
| title=High Availability in {productName} |
| next=ssh-setup.html |
| prev=preface.html |
| ~~~~~~ |
| |
| = High Availability in {productName} |
| |
| [[GSHAG00002]][[abdaq]] |
| |
| |
| [[high-availability-in-glassfish-server]] |
| == 1 High Availability in {productName} |
| |
| This chapter describes the high availability features in {productName} 7. |
| |
| The following topics are addressed here: |
| |
| * link:#abdar[Overview of High Availability] |
| * link:#gaymr[How {productName} Provides High Availability] |
| * link:#gbcot[Recovering from Failures] |
| * link:#abdaz[More Information] |
| |
| [[abdar]][[GSHAG00168]][[overview-of-high-availability]] |
| |
| === Overview of High Availability |
| |
| High availability applications and services provide their functionality |
| continuously, regardless of hardware and software failures. To make such |
| reliability possible, {productName} provides mechanisms for |
| maintaining application state data between clustered {productName} |
| instances. Application state data, such as HTTP session data, stateful |
| EJB sessions, and dynamic cache information, is replicated in real time |
| across server instances. If any one server instance goes down, the |
| session state is available to the next failover server, resulting in |
| minimum application downtime and enhanced transactional security. |
| |
| {productName} provides the following high availability features: |
| |
| * link:#gksdm[Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module] |
| * link:#gaynn[High Availability Session Persistence] |
| * link:#gayna[High Availability Java Message Service] |
| * link:#gaymz[RMI-IIOP Load Balancing and Failover] |
| |
| [[gksdm]][[GSHAG00252]][[load-balancing-with-the-apache-mod_jk-or-mod_proxy_ajp-module]] |
| |
| ==== Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module |
| |
| A common load balancing configuration for {productName} 7 is to use |
| the Apache HTTP Server as the web server front-end, and the Apache |
| `mod_jk` or `mod_proxy_ajp` module as the connector between the web |
| server and {productName}. See |
| link:http-load-balancing.html#gksdt[Configuring {productName} with |
| Apache HTTP Server and `mod_jk`] and |
| link:http-load-balancing.html#CHDCCGDC[Configuring {productName} with |
| Apache HTTP Server and `mod_proxy_ajp`] for more information. |
| |
| [[gaynn]][[GSHAG00253]][[high-availability-session-persistence]] |
| |
| ==== High Availability Session Persistence |
| |
| {productName} provides high availability of HTTP requests and session |
| data (both HTTP session data and stateful session bean data). |
| |
| Jakarta EE applications typically have significant amounts of session state |
| data. A web shopping cart is the classic example of a session state. |
| Also, an application can cache frequently-needed data in the session |
| object. In fact, almost all applications with significant user |
| interactions need to maintain session state. Both HTTP sessions and |
| stateful session beans (SFSBs) have session state data. |
| |
| Preserving session state across server failures can be important to end |
| users. If the {productName} instance hosting the user session |
| experiences a failure, the session state can be recovered, and the |
| session can continue without loss of information. High availability is |
| implemented in {productName} by means of in-memory session |
| replication on {productName} instances running in a cluster. |
| |
| For more information about in-memory session replication in {productName}, see link:#gaymr[How {productName} Provides High |
| Availability]. For detailed instructions on configuring high |
| availability session persistence, see |
| link:session-persistence-and-failover.html#abdkz[Configuring High |
| Availability Session Persistence and Failover]. |
| |
| [[gayna]][[GSHAG00254]][[high-availability-java-message-service]] |
| |
| ==== High Availability Java Message Service |
| |
| {productName} supports the Java Message Service (JMS) API and JMS |
| messaging through its built-in jmsra resource adapter communicating with |
| Open Message Queue as the JMS provider. This combination is often called |
| the JMS Service. |
| |
| The JMS service makes JMS messaging highly available as follows: |
| |
| Message Queue Broker Clusters:: |
| By default, when a GlassFish cluster is created, the JMS service |
| automatically configures a Message Queue broker cluster to provide JMS |
| messaging services, with one clustered broker assigned to each cluster |
| instance. This automatically created broker cluster is configurable to |
| take advantage of the two types of broker clusters, conventional and |
| enhanced, supported by Message Queue. + |
| Additionally, Message Queue broker clusters created and managed using |
| Message Queue itself can be used as external, or remote, JMS hosts. |
| Using external broker clusters provides additional deployment options, |
| such as deploying Message Queue brokers on different hosts from the |
| GlassFish instances they service, or deploying different numbers of |
| Message Queue brokers and GlassFish instances. + |
| For more information about Message Queue clustering, see |
| link:jms.html#abdbx[Using Message Queue Broker Clusters With {productName}]. |
| Connection Failover:: |
| The use of Message Queue broker clusters allows connection failover in |
| the event of a broker failure. If the primary JMS host (Message Queue |
| broker) in use by a GlassFish instance fails, connections to the |
| failed JMS host will automatically fail over to another host in the |
| JMS host list, allowing messaging operations to continue and |
| maintaining JMS messaging semantics. + |
| For more information about JMS connection failover, see |
| link:jms.html#abdbv[Connection Failover]. |
| |
| [[gaymz]][[GSHAG00255]][[rmi-iiop-load-balancing-and-failover]] |
| |
| ==== RMI-IIOP Load Balancing and Failover |
| |
| With RMI-IIOP load balancing, IIOP client requests are distributed to |
| different server instances or name servers, which spreads the load |
| evenly across the cluster, providing scalability. IIOP load balancing |
| combined with EJB clustering and availability also provides EJB |
| failover. |
| |
| When a client performs a JNDI lookup for an object, the Naming Service |
| essentially binds the request to a particular server instance. From then |
| on, all lookup requests made from that client are sent to the same |
| server instance, and thus all `EJBHome` objects will be hosted on the |
| same target server. Any bean references obtained henceforth are also |
| created on the same target host. This effectively provides load |
| balancing, since all clients randomize the list of target servers when |
| performing JNDI lookups. If the target server instance goes down, the |
| lookup or EJB method invocation will failover to another server |
| instance. |
| |
| IIOP Load balancing and failover happens transparently. No special steps |
| are needed during application deployment. If the {productName} |
| instance on which the application client is deployed participates in a |
| cluster, the {productName} finds all currently active IIOP endpoints |
| in the cluster automatically. However, a client should have at least two |
| endpoints specified for bootstrapping purposes, in case one of the |
| endpoints has failed. |
| |
| For more information on RMI-IIOP load balancing and failover, see |
| link:rmi-iiop.html#fxxqs[RMI-IIOP Load Balancing and Failover]. |
| |
| [[gaymr]][[GSHAG00169]][[how-glassfish-server-provides-high-availability]] |
| |
| === How {productName} Provides High Availability |
| |
| {productName} provides high availability through the following |
| subcomponents and features: |
| |
| * link:#gjghv[Storage for Session State Data] |
| * link:#abdax[Highly Available Clusters] |
| |
| [[gjghv]][[GSHAG00256]][[storage-for-session-state-data]] |
| |
| ==== Storage for Session State Data |
| |
| Storing session state data enables the session state to be recovered |
| after the failover of a server instance in a cluster. Recovering the |
| session state enables the session to continue without loss of |
| information. {productName} supports in-memory session replication on |
| other servers in the cluster for maintaining HTTP session and stateful |
| session bean data. |
| |
| In-memory session replication is implemented in {productName} 7 as |
| an OSGi module. Internally, the replication module uses a consistent |
| hash algorithm to pick a replica server instance within a cluster of |
| instances. This allows the replication module to easily locate the |
| replica or replicated data when a container needs to retrieve the data. |
| |
| The use of in-memory replication requires the Group Management Service |
| (GMS) to be enabled. For more information about GMS, see |
| link:clusters.html#gjfnl[Group Management Service]. |
| |
| If server instances in a cluster are located on different hosts, ensure |
| that the following prerequisites are met: |
| |
| * To ensure that GMS and in-memory replication function correctly, the |
| hosts must be on the same subnet. |
| * To ensure that in-memory replication functions correctly, the system |
| clocks on all hosts in the cluster must be synchronized as closely as |
| possible. |
| |
| [[abdax]][[GSHAG00257]][[highly-available-clusters]] |
| |
| ==== Highly Available Clusters |
| |
| A highly available cluster integrates a state replication service with |
| clusters and load balancer. |
| |
| |
| [NOTE] |
| ==== |
| When implementing a highly available cluster, use a load balancer that |
| includes session-based stickiness as part of its load-balancing |
| algorithm. Otherwise, session data can be misdirected or lost. |
| An example of a load balancer that includes session-based stickiness is the |
| Loadbalancer Plug-In available in {productName}. |
| ==== |
| |
| |
| [[abday]][[GSHAG00218]][[clusters-instances-sessions-and-load-balancing]] |
| |
| ===== Clusters, Instances, Sessions, and Load Balancing |
| |
| Clusters, server instances, load balancers, and sessions are related as |
| follows: |
| |
| * A server instance is not required to be part of a cluster. However, an |
| instance that is not part of a cluster cannot take advantage of high |
| availability through transfer of session state from one instance to |
| other instances. |
| * The server instances within a cluster can be hosted on one or multiple |
| hosts. You can group server instances across different hosts into a |
| cluster. |
| * A particular load balancer can forward requests to server instances on |
| multiple clusters. You can use this ability of the load balancer to |
| perform an online upgrade without loss of service. For more information, |
| see link:rolling-upgrade.html#abdin[Upgrading in Multiple Clusters]. |
| * A single cluster can receive requests from multiple load balancers. If |
| a cluster is served by more than one load balancer, you must configure |
| the cluster in exactly the same way on each load balancer. |
| * Each session is tied to a particular cluster. Therefore, although you |
| can deploy an application on multiple clusters, session failover will |
| occur only within a single cluster. |
| |
| The cluster thus acts as a safe boundary for session failover for the |
| server instances within the cluster. You can use the load balancer and |
| upgrade components within the {productName} without loss of service. |
| |
| [[gktax]][[GSHAG00219]][[protocols-for-centralized-cluster-administration]] |
| |
| ===== Protocols for Centralized Cluster Administration |
| |
| {productName} uses the Distributed Component Object Model (DCOM) |
| remote protocol or secure shell (SSH) to ensure that clusters that span |
| multiple hosts can be administered centrally. To perform administrative |
| operations on {productName} instances that are remote from the domain |
| administration server (DAS), the DAS must be able to communicate with |
| those instances. If an instance is running, the DAS connects to the |
| running instance directly. For example, when you deploy an application |
| to an instance, the DAS connects to the instance and deploys the |
| application to the instance. |
| |
| However, the DAS cannot connect to an instance to perform operations on |
| an instance that is not running, such as creating or starting the |
| instance. For these operations, the DAS uses DCOM or SSH to contact a |
| remote host and administer instances there. DCOM or SSH provides |
| confidentiality and security for data that is exchanged between the DAS |
| and remote hosts. |
| |
| |
| [NOTE] |
| ==== |
| The use of DCOM or SSH to enable centralized administration of remote |
| instances is optional. If the use of DCOM SSH is not feasible in your |
| environment, you can administer remote instances locally. |
| ==== |
| |
| |
| For more information, see link:ssh-setup.html#gkshg[Enabling Centralized |
| Administration of {productName} Instances]. |
| |
| [[gbcot]][[GSHAG00170]][[recovering-from-failures]] |
| |
| === Recovering from Failures |
| |
| You can use various techniques to manually recover individual |
| subcomponents after hardware failures such as disk crashes. |
| |
| The following topics are addressed here: |
| |
| * link:#gcmkp[Recovering the Domain Administration Server] |
| * link:#gcmkc[Recovering {productName} Instances] |
| * link:#gcmjs[Recovering the HTTP Load Balancer and Web Server] |
| * link:#gcmjr[Recovering Message Queue] |
| |
| [[gcmkp]][[GSHAG00258]][[recovering-the-domain-administration-server]] |
| |
| ==== Recovering the Domain Administration Server |
| |
| Loss of the Domain Administration Server (DAS) affects only |
| administration. {productName} clusters and standalone instances, and |
| the applications deployed to them, continue to run as before, even if |
| the DAS is not reachable |
| |
| Use any of the following methods to recover the DAS: |
| |
| * Back up the domain periodically, so you have periodic snapshots. After |
| a hardware failure, re-create the DAS on a new host, as described in |
| "link:administration-guide/domains.html#GSADG00542[Re-Creating the Domain Administration Server (DAS)]" |
| in {productName} Administration Guide. |
| * Put the domain installation and configuration on a shared and robust |
| file system (NFS for example). If the primary DAS host fails, a second |
| host is brought up with the same IP address and will take over with |
| manual intervention or user supplied automation. |
| * Zip the {productName} installation and domain root directory. |
| Restore it on the new host, assigning it the same network identity. |
| |
| [[gcmkc]][[GSHAG00259]][[recovering-glassfish-server-instances]] |
| |
| ==== Recovering {productName} Instances |
| |
| {productName} provide tools for backing up and restoring {productName} instances. For more information, see link:instances.html#gksdy[To |
| Resynchronize an Instance and the DAS Offline]. |
| |
| [[gcmjs]][[GSHAG00260]][[recovering-the-http-load-balancer-and-web-server]] |
| |
| ==== Recovering the HTTP Load Balancer and Web Server |
| |
| There are no explicit commands to back up only a web server |
| configuration. Simply zip the web server installation directory. After |
| failure, unzip the saved backup on a new host with the same network |
| identity. If the new host has a different IP address, update the DNS |
| server or the routers. |
| |
| |
| [NOTE] |
| ==== |
| This assumes that the web server is either reinstalled or restored from |
| an image first. |
| ==== |
| |
| |
| The Load Balancer Plug-In (`plugins` directory) and configurations are |
| in the web server installation directory, typically `/opt/SUNWwbsvr`. |
| The web-install``/``web-instance``/config`` directory contains the |
| `loadbalancer.xml` file. |
| |
| [[gcmjr]][[GSHAG00261]][[recovering-message-queue]] |
| |
| ==== Recovering Message Queue |
| |
| When a Message Queue broker becomes unavailable, the method you use to |
| restore the broker to operation depends on the nature of the failure |
| that caused the broker to become unavailable: |
| |
| * Power failure or failure other than disk storage |
| * Failure of disk storage |
| |
| Additionally, the urgency of restoring an unavailable broker to |
| operation depends on the type of the broker: |
| |
| * Standalone Broker. When a standalone broker becomes unavailable, both |
| service availability and data availability are interrupted. Restore the |
| broker to operation as soon as possible to restore availability. |
| * Broker in a Conventional Cluster. When a broker in a conventional |
| cluster becomes unavailable, service availability continues to be |
| provided by the other brokers in the cluster. However, data availability |
| of the persistent data stored by the unavailable broker is interrupted. |
| Restore the broker to operation to restore availability of its |
| persistent data. |
| * Broker in an Enhanced Cluster. When a broker in an enhanced cluster |
| becomes unavailable, service availability and data availability continue |
| to be provided by the other brokers in the cluster. Restore the broker |
| to operation to return the cluster to its previous capacity. |
| |
| [[glaiv]][[GSHAG00220]][[recovering-from-power-failure-and-failures-other-than-disk-storage]] |
| |
| ===== Recovering From Power Failure and Failures Other Than Disk Storage |
| |
| When a host is affected by a power failure or failure of a non-disk |
| component such as memory, processor or network card, restore Message |
| Queue brokers on the affected host by starting the brokers after the |
| failure has been remedied. |
| |
| To start brokers serving as Embedded or Local JMS hosts, start the |
| GlassFish instances the brokers are servicing. To start brokers serving |
| as Remote JMS hosts, use the `imqbrokerd` Message Queue utility. |
| |
| [[glaiu]][[GSHAG00221]][[recovering-from-failure-of-disk-storage]] |
| |
| ===== Recovering from Failure of Disk Storage |
| |
| Message Queue uses disk storage for software, configuration files and |
| persistent data stores. In a default GlassFish installation, all three |
| of these are generally stored on the same disk: the Message Queue |
| software in as-install-parent``/mq``, and broker configuration files and |
| persistent data stores (except for the persistent data stores of |
| enhanced clusters, which are housed in highly available databases) in |
| domain-dir``/imq``. If this disk fails, restoring brokers to operation is |
| impossible unless you have previously created a backup of these items. |
| To create such a backup, use a utility such as `zip`, `gzip` or `tar` to |
| create archives of these directories and all their content. When |
| creating the backup, you should first quiesce all brokers and physical |
| destinations, as described in "link:../openmq/mq-admin-guide/broker-management.html#GMADG00522[Quiescing a Broker]" and |
| "link:../openmq/mq-admin-guide/message-delivery.html#GMADG00533[Pausing and Resuming a Physical Destination]" in Open |
| Message Queue Administration Guide, respectively. Then, after the failed |
| disk is replaced and put into service, expand the backup archive into |
| the same location. |
| |
| Restoring the Persistent Data Store From Backup. For many messaging |
| applications, restoring a persistent data store from backup does not |
| produce the desired results because the backed up store does not |
| represent the content of the store when the disk failure occurred. In |
| some applications, the persistent data changes rapidly enough to make |
| backups obsolete as soon as they are created. To avoid issues in |
| restoring a persistent data store, consider using a RAID or SAN data |
| storage solution that is fault tolerant, especially for data stores in |
| production environments. |
| |
| [[abdaz]][[GSHAG00171]][[more-information]] |
| |
| === More Information |
| |
| For information about planning a high-availability deployment, including |
| assessing hardware requirements, planning network configuration, and |
| selecting a topology, see the link:deployment-planning-guide.html#GSPLG[{productName} Deployment Planning Guide]. This manual also provides a |
| high-level introduction to concepts such as: |
| |
| * {productName} components such as node agents, domains, and clusters |
| * IIOP load balancing in a cluster |
| * Message queue failover |
| |
| For more information about developing applications that take advantage |
| of high availability features, see the link:application-development-guide.html#GSDVG[{productName} Application Development Guide]. |
| |
| For information on how to configure and tune applications and {productName} for best performance with high availability, see the |
| link:performance-tuning-guide.html#GSPTG[{productName} Performance Tuning |
| Guide], which discusses topics such as: |
| |
| * Tuning persistence frequency and persistence scope |
| * Checkpointing stateful session beans |
| * Configuring the JDBC connection pool |
| * Session size |
| * Configuring load balancers for best performance |