blob: c4600ad4c0c1b204add6e7bcd8d77af70056e1bc [file] [log] [blame]
type=page
status=published
title=High Availability in {productName}
next=ssh-setup.html
prev=preface.html
~~~~~~
= High Availability in {productName}
[[GSHAG00002]][[abdaq]]
[[high-availability-in-glassfish-server]]
== 1 High Availability in {productName}
This chapter describes the high availability features in {productName} 7.
The following topics are addressed here:
* link:#abdar[Overview of High Availability]
* link:#gaymr[How {productName} Provides High Availability]
* link:#gbcot[Recovering from Failures]
* link:#abdaz[More Information]
[[abdar]][[GSHAG00168]][[overview-of-high-availability]]
=== Overview of High Availability
High availability applications and services provide their functionality
continuously, regardless of hardware and software failures. To make such
reliability possible, {productName} provides mechanisms for
maintaining application state data between clustered {productName}
instances. Application state data, such as HTTP session data, stateful
EJB sessions, and dynamic cache information, is replicated in real time
across server instances. If any one server instance goes down, the
session state is available to the next failover server, resulting in
minimum application downtime and enhanced transactional security.
{productName} provides the following high availability features:
* link:#gksdm[Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module]
* link:#gaynn[High Availability Session Persistence]
* link:#gayna[High Availability Java Message Service]
* link:#gaymz[RMI-IIOP Load Balancing and Failover]
[[gksdm]][[GSHAG00252]][[load-balancing-with-the-apache-mod_jk-or-mod_proxy_ajp-module]]
==== Load Balancing With the Apache `mod_jk` or `mod_proxy_ajp` Module
A common load balancing configuration for {productName} 7 is to use
the Apache HTTP Server as the web server front-end, and the Apache
`mod_jk` or `mod_proxy_ajp` module as the connector between the web
server and {productName}. See
link:http-load-balancing.html#gksdt[Configuring {productName} with
Apache HTTP Server and `mod_jk`] and
link:http-load-balancing.html#CHDCCGDC[Configuring {productName} with
Apache HTTP Server and `mod_proxy_ajp`] for more information.
[[gaynn]][[GSHAG00253]][[high-availability-session-persistence]]
==== High Availability Session Persistence
{productName} provides high availability of HTTP requests and session
data (both HTTP session data and stateful session bean data).
Jakarta EE applications typically have significant amounts of session state
data. A web shopping cart is the classic example of a session state.
Also, an application can cache frequently-needed data in the session
object. In fact, almost all applications with significant user
interactions need to maintain session state. Both HTTP sessions and
stateful session beans (SFSBs) have session state data.
Preserving session state across server failures can be important to end
users. If the {productName} instance hosting the user session
experiences a failure, the session state can be recovered, and the
session can continue without loss of information. High availability is
implemented in {productName} by means of in-memory session
replication on {productName} instances running in a cluster.
For more information about in-memory session replication in {productName}, see link:#gaymr[How {productName} Provides High
Availability]. For detailed instructions on configuring high
availability session persistence, see
link:session-persistence-and-failover.html#abdkz[Configuring High
Availability Session Persistence and Failover].
[[gayna]][[GSHAG00254]][[high-availability-java-message-service]]
==== High Availability Java Message Service
{productName} supports the Java Message Service (JMS) API and JMS
messaging through its built-in jmsra resource adapter communicating with
Open Message Queue as the JMS provider. This combination is often called
the JMS Service.
The JMS service makes JMS messaging highly available as follows:
Message Queue Broker Clusters::
By default, when a GlassFish cluster is created, the JMS service
automatically configures a Message Queue broker cluster to provide JMS
messaging services, with one clustered broker assigned to each cluster
instance. This automatically created broker cluster is configurable to
take advantage of the two types of broker clusters, conventional and
enhanced, supported by Message Queue. +
Additionally, Message Queue broker clusters created and managed using
Message Queue itself can be used as external, or remote, JMS hosts.
Using external broker clusters provides additional deployment options,
such as deploying Message Queue brokers on different hosts from the
GlassFish instances they service, or deploying different numbers of
Message Queue brokers and GlassFish instances. +
For more information about Message Queue clustering, see
link:jms.html#abdbx[Using Message Queue Broker Clusters With {productName}].
Connection Failover::
The use of Message Queue broker clusters allows connection failover in
the event of a broker failure. If the primary JMS host (Message Queue
broker) in use by a GlassFish instance fails, connections to the
failed JMS host will automatically fail over to another host in the
JMS host list, allowing messaging operations to continue and
maintaining JMS messaging semantics. +
For more information about JMS connection failover, see
link:jms.html#abdbv[Connection Failover].
[[gaymz]][[GSHAG00255]][[rmi-iiop-load-balancing-and-failover]]
==== RMI-IIOP Load Balancing and Failover
With RMI-IIOP load balancing, IIOP client requests are distributed to
different server instances or name servers, which spreads the load
evenly across the cluster, providing scalability. IIOP load balancing
combined with EJB clustering and availability also provides EJB
failover.
When a client performs a JNDI lookup for an object, the Naming Service
essentially binds the request to a particular server instance. From then
on, all lookup requests made from that client are sent to the same
server instance, and thus all `EJBHome` objects will be hosted on the
same target server. Any bean references obtained henceforth are also
created on the same target host. This effectively provides load
balancing, since all clients randomize the list of target servers when
performing JNDI lookups. If the target server instance goes down, the
lookup or EJB method invocation will failover to another server
instance.
IIOP Load balancing and failover happens transparently. No special steps
are needed during application deployment. If the {productName}
instance on which the application client is deployed participates in a
cluster, the {productName} finds all currently active IIOP endpoints
in the cluster automatically. However, a client should have at least two
endpoints specified for bootstrapping purposes, in case one of the
endpoints has failed.
For more information on RMI-IIOP load balancing and failover, see
link:rmi-iiop.html#fxxqs[RMI-IIOP Load Balancing and Failover].
[[gaymr]][[GSHAG00169]][[how-glassfish-server-provides-high-availability]]
=== How {productName} Provides High Availability
{productName} provides high availability through the following
subcomponents and features:
* link:#gjghv[Storage for Session State Data]
* link:#abdax[Highly Available Clusters]
[[gjghv]][[GSHAG00256]][[storage-for-session-state-data]]
==== Storage for Session State Data
Storing session state data enables the session state to be recovered
after the failover of a server instance in a cluster. Recovering the
session state enables the session to continue without loss of
information. {productName} supports in-memory session replication on
other servers in the cluster for maintaining HTTP session and stateful
session bean data.
In-memory session replication is implemented in {productName} 7 as
an OSGi module. Internally, the replication module uses a consistent
hash algorithm to pick a replica server instance within a cluster of
instances. This allows the replication module to easily locate the
replica or replicated data when a container needs to retrieve the data.
The use of in-memory replication requires the Group Management Service
(GMS) to be enabled. For more information about GMS, see
link:clusters.html#gjfnl[Group Management Service].
If server instances in a cluster are located on different hosts, ensure
that the following prerequisites are met:
* To ensure that GMS and in-memory replication function correctly, the
hosts must be on the same subnet.
* To ensure that in-memory replication functions correctly, the system
clocks on all hosts in the cluster must be synchronized as closely as
possible.
[[abdax]][[GSHAG00257]][[highly-available-clusters]]
==== Highly Available Clusters
A highly available cluster integrates a state replication service with
clusters and load balancer.
[NOTE]
====
When implementing a highly available cluster, use a load balancer that
includes session-based stickiness as part of its load-balancing
algorithm. Otherwise, session data can be misdirected or lost.
An example of a load balancer that includes session-based stickiness is the
Loadbalancer Plug-In available in {productName}.
====
[[abday]][[GSHAG00218]][[clusters-instances-sessions-and-load-balancing]]
===== Clusters, Instances, Sessions, and Load Balancing
Clusters, server instances, load balancers, and sessions are related as
follows:
* A server instance is not required to be part of a cluster. However, an
instance that is not part of a cluster cannot take advantage of high
availability through transfer of session state from one instance to
other instances.
* The server instances within a cluster can be hosted on one or multiple
hosts. You can group server instances across different hosts into a
cluster.
* A particular load balancer can forward requests to server instances on
multiple clusters. You can use this ability of the load balancer to
perform an online upgrade without loss of service. For more information,
see link:rolling-upgrade.html#abdin[Upgrading in Multiple Clusters].
* A single cluster can receive requests from multiple load balancers. If
a cluster is served by more than one load balancer, you must configure
the cluster in exactly the same way on each load balancer.
* Each session is tied to a particular cluster. Therefore, although you
can deploy an application on multiple clusters, session failover will
occur only within a single cluster.
The cluster thus acts as a safe boundary for session failover for the
server instances within the cluster. You can use the load balancer and
upgrade components within the {productName} without loss of service.
[[gktax]][[GSHAG00219]][[protocols-for-centralized-cluster-administration]]
===== Protocols for Centralized Cluster Administration
{productName} uses the Distributed Component Object Model (DCOM)
remote protocol or secure shell (SSH) to ensure that clusters that span
multiple hosts can be administered centrally. To perform administrative
operations on {productName} instances that are remote from the domain
administration server (DAS), the DAS must be able to communicate with
those instances. If an instance is running, the DAS connects to the
running instance directly. For example, when you deploy an application
to an instance, the DAS connects to the instance and deploys the
application to the instance.
However, the DAS cannot connect to an instance to perform operations on
an instance that is not running, such as creating or starting the
instance. For these operations, the DAS uses DCOM or SSH to contact a
remote host and administer instances there. DCOM or SSH provides
confidentiality and security for data that is exchanged between the DAS
and remote hosts.
[NOTE]
====
The use of DCOM or SSH to enable centralized administration of remote
instances is optional. If the use of DCOM SSH is not feasible in your
environment, you can administer remote instances locally.
====
For more information, see link:ssh-setup.html#gkshg[Enabling Centralized
Administration of {productName} Instances].
[[gbcot]][[GSHAG00170]][[recovering-from-failures]]
=== Recovering from Failures
You can use various techniques to manually recover individual
subcomponents after hardware failures such as disk crashes.
The following topics are addressed here:
* link:#gcmkp[Recovering the Domain Administration Server]
* link:#gcmkc[Recovering {productName} Instances]
* link:#gcmjs[Recovering the HTTP Load Balancer and Web Server]
* link:#gcmjr[Recovering Message Queue]
[[gcmkp]][[GSHAG00258]][[recovering-the-domain-administration-server]]
==== Recovering the Domain Administration Server
Loss of the Domain Administration Server (DAS) affects only
administration. {productName} clusters and standalone instances, and
the applications deployed to them, continue to run as before, even if
the DAS is not reachable
Use any of the following methods to recover the DAS:
* Back up the domain periodically, so you have periodic snapshots. After
a hardware failure, re-create the DAS on a new host, as described in
"link:administration-guide/domains.html#GSADG00542[Re-Creating the Domain Administration Server (DAS)]"
in {productName} Administration Guide.
* Put the domain installation and configuration on a shared and robust
file system (NFS for example). If the primary DAS host fails, a second
host is brought up with the same IP address and will take over with
manual intervention or user supplied automation.
* Zip the {productName} installation and domain root directory.
Restore it on the new host, assigning it the same network identity.
[[gcmkc]][[GSHAG00259]][[recovering-glassfish-server-instances]]
==== Recovering {productName} Instances
{productName} provide tools for backing up and restoring {productName} instances. For more information, see link:instances.html#gksdy[To
Resynchronize an Instance and the DAS Offline].
[[gcmjs]][[GSHAG00260]][[recovering-the-http-load-balancer-and-web-server]]
==== Recovering the HTTP Load Balancer and Web Server
There are no explicit commands to back up only a web server
configuration. Simply zip the web server installation directory. After
failure, unzip the saved backup on a new host with the same network
identity. If the new host has a different IP address, update the DNS
server or the routers.
[NOTE]
====
This assumes that the web server is either reinstalled or restored from
an image first.
====
The Load Balancer Plug-In (`plugins` directory) and configurations are
in the web server installation directory, typically `/opt/SUNWwbsvr`.
The web-install``/``web-instance``/config`` directory contains the
`loadbalancer.xml` file.
[[gcmjr]][[GSHAG00261]][[recovering-message-queue]]
==== Recovering Message Queue
When a Message Queue broker becomes unavailable, the method you use to
restore the broker to operation depends on the nature of the failure
that caused the broker to become unavailable:
* Power failure or failure other than disk storage
* Failure of disk storage
Additionally, the urgency of restoring an unavailable broker to
operation depends on the type of the broker:
* Standalone Broker. When a standalone broker becomes unavailable, both
service availability and data availability are interrupted. Restore the
broker to operation as soon as possible to restore availability.
* Broker in a Conventional Cluster. When a broker in a conventional
cluster becomes unavailable, service availability continues to be
provided by the other brokers in the cluster. However, data availability
of the persistent data stored by the unavailable broker is interrupted.
Restore the broker to operation to restore availability of its
persistent data.
* Broker in an Enhanced Cluster. When a broker in an enhanced cluster
becomes unavailable, service availability and data availability continue
to be provided by the other brokers in the cluster. Restore the broker
to operation to return the cluster to its previous capacity.
[[glaiv]][[GSHAG00220]][[recovering-from-power-failure-and-failures-other-than-disk-storage]]
===== Recovering From Power Failure and Failures Other Than Disk Storage
When a host is affected by a power failure or failure of a non-disk
component such as memory, processor or network card, restore Message
Queue brokers on the affected host by starting the brokers after the
failure has been remedied.
To start brokers serving as Embedded or Local JMS hosts, start the
GlassFish instances the brokers are servicing. To start brokers serving
as Remote JMS hosts, use the `imqbrokerd` Message Queue utility.
[[glaiu]][[GSHAG00221]][[recovering-from-failure-of-disk-storage]]
===== Recovering from Failure of Disk Storage
Message Queue uses disk storage for software, configuration files and
persistent data stores. In a default GlassFish installation, all three
of these are generally stored on the same disk: the Message Queue
software in as-install-parent``/mq``, and broker configuration files and
persistent data stores (except for the persistent data stores of
enhanced clusters, which are housed in highly available databases) in
domain-dir``/imq``. If this disk fails, restoring brokers to operation is
impossible unless you have previously created a backup of these items.
To create such a backup, use a utility such as `zip`, `gzip` or `tar` to
create archives of these directories and all their content. When
creating the backup, you should first quiesce all brokers and physical
destinations, as described in "link:../openmq/mq-admin-guide/broker-management.html#GMADG00522[Quiescing a Broker]" and
"link:../openmq/mq-admin-guide/message-delivery.html#GMADG00533[Pausing and Resuming a Physical Destination]" in Open
Message Queue Administration Guide, respectively. Then, after the failed
disk is replaced and put into service, expand the backup archive into
the same location.
Restoring the Persistent Data Store From Backup. For many messaging
applications, restoring a persistent data store from backup does not
produce the desired results because the backed up store does not
represent the content of the store when the disk failure occurred. In
some applications, the persistent data changes rapidly enough to make
backups obsolete as soon as they are created. To avoid issues in
restoring a persistent data store, consider using a RAID or SAN data
storage solution that is fault tolerant, especially for data stores in
production environments.
[[abdaz]][[GSHAG00171]][[more-information]]
=== More Information
For information about planning a high-availability deployment, including
assessing hardware requirements, planning network configuration, and
selecting a topology, see the link:deployment-planning-guide.html#GSPLG[{productName} Deployment Planning Guide]. This manual also provides a
high-level introduction to concepts such as:
* {productName} components such as node agents, domains, and clusters
* IIOP load balancing in a cluster
* Message queue failover
For more information about developing applications that take advantage
of high availability features, see the link:application-development-guide.html#GSDVG[{productName} Application Development Guide].
For information on how to configure and tune applications and {productName} for best performance with high availability, see the
link:performance-tuning-guide.html#GSPTG[{productName} Performance Tuning
Guide], which discusses topics such as:
* Tuning persistence frequency and persistence scope
* Checkpointing stateful session beans
* Configuring the JDBC connection pool
* Session size
* Configuring load balancers for best performance