This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Apache Helix is a generic cluster management framework for automatic management of partitioned, replicated, and distributed resources. It handles resource/partition assignment to nodes, failure detection and recovery, dynamic resource/node addition, pluggable state machines for state transitions, and automatic load balancing with throttling.
- Full build:
mvn clean install - Build without tests:
mvn clean install -Dmaven.test.skip.exec=true - Run specific module tests:
mvn test -pl helix-core - Run specific test class:
mvn test -pl helix-core -Dtest=TestClassName - Run specific test method:
mvn test -pl helix-core -Dtest=TestClassName#testMethodName - Integration tests:
mvn verify -pl helix-core -P integration-test - Frontend (Angular/Node 14, Yarn 1.22+):
cd helix-front && yarn install && yarn test - Frontend e2e:
cd helix-front && yarn cypress:open
Build order matters — modules listed earlier are dependencies of later ones:
| Module | Purpose |
|---|---|
metrics-common |
Common metrics and monitoring interfaces (JMX MBeans) |
metadata-store-directory-common |
Shared metadata store directory utilities |
zookeeper-api |
ZooKeeper abstraction layer; wraps raw ZK client |
helix-common |
Shared utilities, data models (ZNRecord), exceptions, constants |
helix-core |
Main engine: controller, participant, spectator, rebalancers, state machines, pipeline |
helix-rest |
REST API server for cluster management operations |
helix-lock |
Distributed lock primitives built on ZooKeeper |
helix-agent |
Command execution agent for running tasks on cluster nodes |
recipes |
Example implementations (distributed lock manager, etc.) |
helix-view-aggregator |
Aggregates customized views across the cluster |
meta-client |
Client for interacting with metadata stores |
helix-front |
Angular frontend UI for cluster administration |
Every process joins a Helix cluster via HelixManagerFactory.getZKHelixManager() in one of three roles:
- PARTICIPANT — cluster nodes that host resource partitions and execute state transitions
- CONTROLLER — manages cluster state, computes rebalancing, dispatches state transition messages
- SPECTATOR — reads cluster state for client-side routing (e.g.,
RoutingTableProvider)
All cluster state lives as ZNodes under /{clusterName}/:
CONFIGS/ — Cluster, instance, and resource configuration (persistent)
LIVEINSTANCES/ — Ephemeral nodes; appear on connect, vanish on disconnect
INSTANCES/ — Instance definitions + per-instance subtrees:
{instance}/MESSAGES/ — State transition commands from controller
{instance}/CURRENTSTATES/ — Actual partition states reported by participant
{instance}/ERRORS/ — Error reports
IDEALSTATES/ — Desired partition-to-instance assignments (persistent, cached)
EXTERNALVIEW/ — Controller-computed actual state (for spectators)
STATEMODELDEFS/ — State machine definitions (persistent, cached)
CONTROLLER/ — Leader election (ephemeral), history, pause/maintenance signals
Key properties like CONFIGS, IDEALSTATES, STATEMODELDEFS, and CURRENTSTATES are cached in-memory for performance.
The controller processes cluster events through an ordered pipeline of stages (in helix-core/.../controller/stages/):
ReadClusterDataStage— read cluster state from ZK into cacheResourceComputationStage— determine which resources need rebalancingCurrentStateComputationStage— compile current state from all instancesBestPossibleStateCalcStage— invoke Rebalancer to compute ideal assignmentIntermediateStateCalcStage— calculate intermediate states for safe transitionsMessageGenerationPhase→MessageSelectionStage→MessageThrottleStage→MessageDispatchStage— generate, filter, throttle, and send state transition messagesExternalViewComputeStage— update ExternalView for spectators
Events are queued and processed sequentially. Multiple events of the same type coalesce.
State models define valid states and transitions. Built-in models in helix-core/.../model/builder/:
- MasterSlave — Master(1), Slave(R-1), Offline(N-R). Used for databases.
- LeaderStandby — Leader(1), Standby(R-1), Offline(N-R). Used for services.
- OnlineOffline — Online(R), Offline(N-R). Simplest model.
- Task — RUNNING, COMPLETED, FAILED. For job execution.
Participants implement state transitions via @Transition(from, to) annotated methods on StateModel subclasses, registered through StateModelFactory on the StateMachineEngine.
Rebalance modes (set on IdealState):
- FULL_AUTO — controller computes both partition→instance mapping and state assignment
- SEMI_AUTO — operator provides preference lists; controller assigns states
- CUSTOMIZED — operator fully controls the ideal state
- USER_DEFINED — custom
Rebalancerimplementation class
Key rebalancer implementations in helix-core/.../controller/rebalancer/:
AutoRebalancer— default FULL_AUTO rebalancerDelayedAutoRebalancer— adds recovery delay to avoid flappingWagedRebalancer(waged/subpackage) — weighted, globally-efficient multi-objective rebalancer with capacity constraintsCrushRebalanceStrategy— CRUSH-based consistent hashing for placement
Controller computes transition → writes Message to /{cluster}/INSTANCES/{instance}/MESSAGES/
→ Participant reads message → HelixTaskExecutor dispatches to StateModel handler
→ Handler executes @Transition method → updates CURRENTSTATES → deletes message
Key endpoints under HelixRestServer:
/clusters— list/manage clusters/clusters/{cluster}/resources— manage resources/clusters/{cluster}/instances— manage instances/clusters/{cluster}/configs— configuration CRUD
| What | Where |
|---|---|
| HelixManager interface | helix-core/src/main/java/org/apache/helix/HelixManager.java |
| Controller logic | helix-core/.../controller/GenericHelixController.java |
| Pipeline stages | helix-core/.../controller/stages/ |
| Rebalancers | helix-core/.../controller/rebalancer/ |
| Data model classes | helix-core/.../model/ (IdealState, CurrentState, ExternalView, Message, etc.) |
| ZNode path definitions | helix-core/.../PropertyType.java, PropertyPathBuilder.java |
| State machine framework | helix-core/.../participant/statemachine/ |
| Message handling | helix-core/.../messaging/handling/ |
| REST endpoints | helix-rest/.../rest/server/resources/ |
| ZK abstraction | zookeeper-api/src/main/java/org/apache/helix/zookeeper/ |
| Monitoring MBeans | helix-core/.../monitoring/mbeans/ |
When creating or modifying design documents:
- Use the template: All design docs MUST follow
docs/design/000-TEMPLATE.md. - Location:
docs/design/NNN-short-title.mdwith sequential numbering. - Required sections: Summary, Problem Statement, Goals/Non-Goals, Design, Implementation Plan (with file paths and validation commands), Testing Strategy.
- Status tracking: Set the Status field (Draft / In Review / Approved / Implementing / Done).
- Module references: List affected modules in the metadata table.
- Implementation steps: Each step must reference specific file paths, the module, and a validation command.
- Diagrams: Use Mermaid syntax. ASCII art only when Mermaid cannot express the concept.
- Cross-references: Use relative paths:
[Title](./NNN-title.md).