Skip to main content

LLM High Availability

LLM's high throughput and low latency messaging services are combined with an extensive set of high availability (HA) features. LLM's HA layer is called RCMS (reliable consistent message streaming). RCMS is using active and semi-active replication techniques for high availability. This means that multiple instances of the application are running simultaneously, as shown in the figure below. RCMS ensures that all instances of the application get exactly the same input and can therefore maintain identical states. If one of the instances of the application fails, RCMS will detect the failure and would automatically take corrective actions.

Sample configuration of RCMS services
Sample configuration of RCMS services

Some of the main features of RCMS include the following:

  • Total order. This feature enforces a consistent delivery of messages from a number of independent data transmitters to multiple receivers, meaning that all the receivers are delivered in exactly the same ordered stream of incoming messages.
  • Fast failure detection and failover. RCMS automatically detects a failure of an instance and automatically performs the necessary actions to fail over to a different instance. Since RCMS maintains an active backup, failover is extremely fast.
  • Control over replication set and replication method. RCMS allows the user to configure the number of replicas in an RCMS tier (replication set). RCMS supports virtual and real synchrony-based replication or a hybrid of virtual and real synchrony.
  • New component priming. RCMS allows dynamic addition of a new application component to a tier of existing similar components. RCMS automatically synchronizes the state of the component's incoming and outgoing traffic and helps synchronize the state of the application itself. As a result, the new component can start full functioning in parallel with its existing running peers, and support them in case of failure.
  • Message loss prevention and synchronization. RCMS can be configured to prevent any message loss or to synchronize the tier members in case message loss is detected.
  • Split-Brain prevention. RCMS provides a set of tools to deal with split-brain (partition) situations. Slit-Brain prevention is important to ensure the consistency of the HA service.
  • Intra-Tier communication. RCMS allows communication among tier members, such as members that form a replica set. RCMS provides a mechanism to ensure that external actions that affect the application state, such as time-based operations, non-deterministic operations, application synchronization points, or interactions with external data sources, are performed in sync by all tier members.
  • Efficient handling of network resources. RCMS uses network resources efficiently by using multicast to deliver the same set of messages to all tier members and by avoiding the generation of duplicate outgoing messages from the backup tier members.