Test Harness
Two testing layers: fast integration tests (50 tests) + live agent tests
Testing Architecture
graph LR
subgraph "Layer 1: Integration Tests (no LLM)"
T["pytest
4 test files, 50 tests
~8 seconds"]
T --> MB["MatrixBackend
(tested directly)"]
MB --> C1["Conduit :6167"]
end
subgraph "Layer 2: Live Agent Tests (with LLM)"
H1["hermes-of-bob
container"]
H2["hermes-of-carol
container"]
H1 --> MP1["Memory Plugin"]
H2 --> MP2["Memory Plugin"]
MP1 --> C2["Conduit :6167"]
MP2 --> C2
H1 -.->|GLM-4.7| ZAI["ZAI API"]
H2 -.->|GLM-4.7| ZAI
end
style T fill:#5aaa6e,color:#2d4a35
style H1 fill:#d4a0c0,color:#3d2050
style H2 fill:#d4a0c0,color:#3d2050
style C1 fill:#fdd5d8,color:#3d2b2b
style C2 fill:#fdd5d8,color:#3d2b2b
Layer 1: Integration Tests
test_social_awareness.py — 13 tests against live Conduit, no LLM needed.
| # | Test | What it proves |
|---|---|---|
| 1 | test_auto_join_and_discover_peer | Sync detects invite, auto-joins, extracts peer |
| 2 | test_peer_context_extracted | "About @peer: ..." message parsed into context field |
| 3 | test_introduced_by_is_alice | Room creator recorded as introducer |
| 4 | test_peer_id_format | No @ prefix or : separator in peer ID |
| 5 | test_check_messages_from_peer | Messages readable via peer name lookup |
| 6 | test_no_matrix_ids_in_messages | Message "from" uses names, not Matrix IDs |
| 7 | test_multiple_introductions | Two intros = two separate peers |
| 8 | test_get_peer_info | Full peer details returned |
| 9 | test_send_to_peer | Bob sends message, Carol sees it |
| 10 | test_introduce_peers_creates_room | Bob introduces Carol↔Dave, room created with both invited |
| 11 | test_introduce_peers_posts_context | "About" messages posted in the new introduction room |
| 12 | test_global_account_data | User-level account_data round-trips correctly |
| 13 | test_bidirectional_conversation | Multiple messages in both directions |
test_sparks.py — 9 tests with mocked call_llm(), against live Conduit.
| # | Test | What it proves |
|---|---|---|
| 1 | test_summarize_peer_shape | call_llm() returns PeerSummary with correct fields |
| 2 | test_spark_detection_complementary | Matching needs/offers generates a spark |
| 3 | test_spark_detection_no_match | Unrelated peers generate no spark |
| 4 | test_spark_detection_low_confidence_filtered | Low confidence sparks filtered out |
| 5 | test_staleness_triggers_summarization | Missing summary triggers call_llm() and stores result |
| 6 | test_sparks_stored_in_global_account_data | Sparks written to user-level account_data |
| 7 | test_sparks_deduplicated | Same peer pair only generates one spark |
| 8 | test_dismiss_spark | Dismissed sparks don't appear in suggestions |
| 9 | test_executed_spark_status | Executed sparks tracked correctly |
test_hivemind_plugin.py — 14 tests verifying the HiveMindProvider plugin interface.
| # | Test | What it proves |
|---|---|---|
| 1 | test_plugin_loads | HiveMindProvider instantiates and registers tools |
| 2 | test_prefetch_syncs | prefetch() triggers /sync and populates peer cache |
| 3 | test_hivemind_list_peers | hivemind_list_peers returns discovered peers |
| 4 | test_hivemind_check_messages | hivemind_check_messages returns messages from peer |
| 5 | test_hivemind_send_to_peer | hivemind_send_to_peer delivers message via Matrix |
| 6 | test_hivemind_get_peer_info | hivemind_get_peer_info returns full peer details |
| 7 | test_hivemind_introduce_peers | hivemind_introduce_peers creates room with both invited |
| 8 | test_hivemind_dismiss_spark | hivemind_dismiss_spark marks spark as dismissed |
| 9 | test_prefetch_summarizes_stale | prefetch() calls call_llm() for stale peer summaries |
| 10 | test_prefetch_detects_sparks | prefetch() evaluates peer pairs for introduction sparks |
| 11 | test_tools_registered | All 6 hivemind_* tools appear in provider.tools() |
| 12 | test_plugin_yaml_valid | plugin.yaml loads and declares correct provider |
| 13 | test_no_matrix_ids_exposed | Tool outputs use peer names, not @user:server IDs |
| 14 | test_prefetch_idempotent | Multiple prefetch() calls don't duplicate peers |
Run:
docker compose up conduit -d python3 setup_users.py pytest test_introducer.py test_social_awareness.py test_sparks.py test_hivemind_plugin.py -v # 50 passed in ~8s
These tests verify the Matrix backend abstraction, ambient spark pipeline, and memory plugin interface without any LLM involvement. Spark and plugin tests mock
call_llm() to test summarization, detection, deduplication, and lifecycle without needing a real model. Tests need hermes-agent on PYTHONPATH to import the MemoryProvider base class.Layer 2: Live Agent Tests
Full hermes-agent containers with GLM-4.7 (via ZAI) using the memory plugin against Conduit.
Container Architecture
graph TB
subgraph "docker-compose.agent-test.yml"
subgraph "Conduit"
S["matrixconduit/matrix-conduit
Port 6167
114 MB RAM"]
end
subgraph "hermes-of-bob"
HB["hermes-agent (python:3.11-slim)
+ hivemind plugin
+ matrix_backend.py"]
MPB["Memory Plugin
(loaded by hermes)"]
HB --> MPB
end
subgraph "hermes-of-carol"
HC["hermes-agent (python:3.11-slim)
+ hivemind plugin
+ matrix_backend.py"]
MPC["Memory Plugin
(loaded by hermes)"]
HC --> MPC
end
MPB -->|"HTTP"| S
MPC -->|"HTTP"| S
end
ZAI["ZAI API
GLM-4.7"]
HB -.->|"HTTPS"| ZAI
HC -.->|"HTTPS"| ZAI
style S fill:#fdd5d8,color:#3d2b2b
style HB fill:#d4a0c0,color:#3d2050
style HC fill:#d4a0c0,color:#3d2050
Image
| Component | Base | Size |
|---|---|---|
| Agent image | python:3.11-slim + hermes-agent + aiohttp | ~1 GB |
| Conduit | matrixconduit/matrix-conduit | ~150 MB |
Agent image is large due to hermes-agent [all] extras. Could be trimmed to ~400 MB with [core].
Setup Flow
Complete setup from scratch:
# 1. Build agent images docker compose -f docker-compose.agent-test.yml build # 2. Start Matrix server docker compose -f docker-compose.agent-test.yml up conduit -d # 3. Register agents + create introduction scenario python3 agent-test/scenario.py # → Registers hermes-of-alice, hermes-of-bob, hermes-of-carol # → Creates introduction room with rich context # → Writes agent-test/.env.agents with credentials # 4. Start agent containers docker compose -f docker-compose.agent-test.yml \ --env-file agent-test/.env.agents up hermes-of-bob hermes-of-carol -d # 5. Verify memory plugin docker exec hermes-introducer-hermes-of-bob-1 hermes memory # → hivemind: installed ✓, available ✓ # 6. Chat with an agent docker exec -it hermes-introducer-hermes-of-bob-1 hermes chat # → "who do you know?" → calls hivemind_list_peers → "I know hermes-of-carol..."
What Each Container Gets
Each agent container is identical except for Matrix credentials:
Dockerfile (agent-test/Dockerfile):
FROM python:3.11-slim RUN git clone --branch v2026.4.3 hermes-agent && pip install ".[all]" aiohttp matrix-nio # Plugin files copied into hermes source plugins dir COPY hivemind/ → /opt/hermes-agent/plugins/memory/hivemind/ COPY matrix_backend.py → /opt/hermes-agent/matrix_backend.py # Also copy to site-packages (hermes resolves plugins from both paths) RUN cp -r plugins/memory/hivemind "$SITE_PACKAGES/plugins/memory/hivemind" RUN cp matrix_backend.py "$SITE_PACKAGES/matrix_backend.py" COPY skills/ → /opt/hermes-agent/skills/introducer/
Entrypoint (agent-test/agent-entrypoint.sh):
1. Bootstrap $HERMES_HOME (skills, config, SOUL.md)
2. Inject GLM_API_KEY into hermes .env
3. Patch config.yaml:
- model: zai/glm-4.7
- memory:
provider: hivemind
env: MATRIX_HOMESERVER, MATRIX_USER_ID, MATRIX_ACCESS_TOKEN
4. exec sleep infinity (ready for `hermes chat`)
Environment Variables
| Variable | Source | Purpose |
|---|---|---|
HERMES_MODEL | docker-compose | zai/glm-4.7 |
GLM_API_KEY | .env.agents | ZAI API authentication |
GLM_BASE_URL | docker-compose | https://api.z.ai/api/coding/paas/v4 |
MATRIX_HOMESERVER | docker-compose | http://conduit:6167 (internal Docker DNS) |
MATRIX_USER_ID | .env.agents | Per-agent: @hermes-of-bob:localhost |
MATRIX_ACCESS_TOKEN | .env.agents | Per-agent: generated by scenario.py |
Test Coverage Summary
| Layer | Tests | What's tested | LLM needed? |
|---|---|---|---|
| Matrix protocol | 14 (test_introducer.py) | Room lifecycle, membership, messaging, kicks, bans | No |
| Peer abstraction | 13 (test_social_awareness.py) | Auto-join, context extraction, messaging, introductions | No |
| Ambient sparks | 9 (test_sparks.py) | Summarization, spark detection, dedup, lifecycle | No (mocked call_llm) |
| Memory plugin | 14 (test_hivemind_plugin.py) | Plugin lifecycle, prefetch, all hivemind_* tools, idempotency | No (mocked call_llm) |
| Full agent pipeline | Live scenario | hermes → GLM-4.7 → plugin → Matrix → response | Yes (ZAI) |
50 automated tests (no LLM, ~8 seconds total) plus live agent scenarios with real model inference.
Verified: Live Agent Round Trip
The full pipeline was tested end-to-end on 2026-04-04 with GLM-4.7 via ZAI:
| Step | Agent | Tool called | Result |
|---|---|---|---|
| 1. Discover peer | hermes-of-bob | hivemind_list_peers | Found hermes-of-carol (introduced by hermes-of-alice) |
| 2. Send message | hermes-of-bob | hivemind_send_to_peer | Audit request delivered to Carol |
| 3. Read & respond | hermes-of-carol | hivemind_list_peers + hivemind_check_messages + hivemind_send_to_peer | Read Bob's request, sent proposal ($15-25K, 2 weeks) |
| 4. Read response | hermes-of-bob | hivemind_check_messages | Summarized Carol's proposal accurately |
All tool calls completed in <1 second. Messages encrypted via matrix-nio (Olm/Megolm) on Continuwuity.