Test Harness

Two testing layers: fast integration tests (50 tests) + live agent tests

Testing Architecture

graph LR
    subgraph "Layer 1: Integration Tests (no LLM)"
        T["pytest
4 test files, 50 tests
~8 seconds"]
        T --> MB["MatrixBackend
(tested directly)"]
        MB --> C1["Conduit :6167"]
    end

    subgraph "Layer 2: Live Agent Tests (with LLM)"
        H1["hermes-of-bob
container"]
        H2["hermes-of-carol
container"]
        H1 --> MP1["Memory Plugin"]
        H2 --> MP2["Memory Plugin"]
        MP1 --> C2["Conduit :6167"]
        MP2 --> C2
        H1 -.->|GLM-4.7| ZAI["ZAI API"]
        H2 -.->|GLM-4.7| ZAI
    end

    style T fill:#5aaa6e,color:#2d4a35
    style H1 fill:#d4a0c0,color:#3d2050
    style H2 fill:#d4a0c0,color:#3d2050
    style C1 fill:#fdd5d8,color:#3d2b2b
    style C2 fill:#fdd5d8,color:#3d2b2b

Layer 1: Integration Tests

test_social_awareness.py — 13 tests against live Conduit, no LLM needed.

#	Test	What it proves
1	test_auto_join_and_discover_peer	Sync detects invite, auto-joins, extracts peer
2	test_peer_context_extracted	"About @peer: ..." message parsed into context field
3	test_introduced_by_is_alice	Room creator recorded as introducer
4	test_peer_id_format	No @ prefix or : separator in peer ID
5	test_check_messages_from_peer	Messages readable via peer name lookup
6	test_no_matrix_ids_in_messages	Message "from" uses names, not Matrix IDs
7	test_multiple_introductions	Two intros = two separate peers
8	test_get_peer_info	Full peer details returned
9	test_send_to_peer	Bob sends message, Carol sees it
10	test_introduce_peers_creates_room	Bob introduces Carol↔Dave, room created with both invited
11	test_introduce_peers_posts_context	"About" messages posted in the new introduction room
12	test_global_account_data	User-level account_data round-trips correctly
13	test_bidirectional_conversation	Multiple messages in both directions

test_sparks.py — 9 tests with mocked call_llm(), against live Conduit.

#	Test	What it proves
1	test_summarize_peer_shape	call_llm() returns PeerSummary with correct fields
2	test_spark_detection_complementary	Matching needs/offers generates a spark
3	test_spark_detection_no_match	Unrelated peers generate no spark
4	test_spark_detection_low_confidence_filtered	Low confidence sparks filtered out
5	test_staleness_triggers_summarization	Missing summary triggers call_llm() and stores result
6	test_sparks_stored_in_global_account_data	Sparks written to user-level account_data
7	test_sparks_deduplicated	Same peer pair only generates one spark
8	test_dismiss_spark	Dismissed sparks don't appear in suggestions
9	test_executed_spark_status	Executed sparks tracked correctly

test_hivemind_plugin.py — 14 tests verifying the HiveMindProvider plugin interface.

#	Test	What it proves
1	test_plugin_loads	HiveMindProvider instantiates and registers tools
2	test_prefetch_syncs	prefetch() triggers /sync and populates peer cache
3	test_hivemind_list_peers	hivemind_list_peers returns discovered peers
4	test_hivemind_check_messages	hivemind_check_messages returns messages from peer
5	test_hivemind_send_to_peer	hivemind_send_to_peer delivers message via Matrix
6	test_hivemind_get_peer_info	hivemind_get_peer_info returns full peer details
7	test_hivemind_introduce_peers	hivemind_introduce_peers creates room with both invited
8	test_hivemind_dismiss_spark	hivemind_dismiss_spark marks spark as dismissed
9	test_prefetch_summarizes_stale	prefetch() calls call_llm() for stale peer summaries
10	test_prefetch_detects_sparks	prefetch() evaluates peer pairs for introduction sparks
11	test_tools_registered	All 6 hivemind_* tools appear in provider.tools()
12	test_plugin_yaml_valid	plugin.yaml loads and declares correct provider
13	test_no_matrix_ids_exposed	Tool outputs use peer names, not @user:server IDs
14	test_prefetch_idempotent	Multiple prefetch() calls don't duplicate peers

Run:

docker compose up conduit -d
python3 setup_users.py
pytest test_introducer.py test_social_awareness.py test_sparks.py test_hivemind_plugin.py -v

# 50 passed in ~8s

These tests verify the Matrix backend abstraction, ambient spark pipeline, and memory plugin interface without any LLM involvement. Spark and plugin tests mock call_llm() to test summarization, detection, deduplication, and lifecycle without needing a real model. Tests need hermes-agent on PYTHONPATH to import the MemoryProvider base class.

Layer 2: Live Agent Tests

Full hermes-agent containers with GLM-4.7 (via ZAI) using the memory plugin against Conduit.

Container Architecture

graph TB
    subgraph "docker-compose.agent-test.yml"
        subgraph "Conduit"
            S["matrixconduit/matrix-conduit
Port 6167
114 MB RAM"]
        end
        subgraph "hermes-of-bob"
            HB["hermes-agent (python:3.11-slim)
+ hivemind plugin
+ matrix_backend.py"]
            MPB["Memory Plugin
(loaded by hermes)"]
            HB --> MPB
        end
        subgraph "hermes-of-carol"
            HC["hermes-agent (python:3.11-slim)
+ hivemind plugin
+ matrix_backend.py"]
            MPC["Memory Plugin
(loaded by hermes)"]
            HC --> MPC
        end
        MPB -->|"HTTP"| S
        MPC -->|"HTTP"| S
    end

    ZAI["ZAI API
GLM-4.7"]
    HB -.->|"HTTPS"| ZAI
    HC -.->|"HTTPS"| ZAI

    style S fill:#fdd5d8,color:#3d2b2b
    style HB fill:#d4a0c0,color:#3d2050
    style HC fill:#d4a0c0,color:#3d2050

Image

Component	Base	Size
Agent image	`python:3.11-slim` + hermes-agent + aiohttp	~1 GB
Conduit	`matrixconduit/matrix-conduit`	~150 MB

Agent image is large due to hermes-agent [all] extras. Could be trimmed to ~400 MB with [core].

Setup Flow

Complete setup from scratch:

# 1. Build agent images
docker compose -f docker-compose.agent-test.yml build

# 2. Start Matrix server
docker compose -f docker-compose.agent-test.yml up conduit -d

# 3. Register agents + create introduction scenario
python3 agent-test/scenario.py
# → Registers hermes-of-alice, hermes-of-bob, hermes-of-carol
# → Creates introduction room with rich context
# → Writes agent-test/.env.agents with credentials

# 4. Start agent containers
docker compose -f docker-compose.agent-test.yml \
  --env-file agent-test/.env.agents up hermes-of-bob hermes-of-carol -d

# 5. Verify memory plugin
docker exec hermes-introducer-hermes-of-bob-1 hermes memory
# → hivemind: installed ✓, available ✓

# 6. Chat with an agent
docker exec -it hermes-introducer-hermes-of-bob-1 hermes chat
# → "who do you know?" → calls hivemind_list_peers → "I know hermes-of-carol..."

What Each Container Gets

Each agent container is identical except for Matrix credentials:

Dockerfile (agent-test/Dockerfile):

FROM python:3.11-slim
RUN git clone --branch v2026.4.3 hermes-agent && pip install ".[all]" aiohttp matrix-nio

# Plugin files copied into hermes source plugins dir
COPY hivemind/          → /opt/hermes-agent/plugins/memory/hivemind/
COPY matrix_backend.py  → /opt/hermes-agent/matrix_backend.py

# Also copy to site-packages (hermes resolves plugins from both paths)
RUN cp -r plugins/memory/hivemind "$SITE_PACKAGES/plugins/memory/hivemind"
RUN cp matrix_backend.py "$SITE_PACKAGES/matrix_backend.py"

COPY skills/            → /opt/hermes-agent/skills/introducer/

Entrypoint (agent-test/agent-entrypoint.sh):

1. Bootstrap $HERMES_HOME (skills, config, SOUL.md)
2. Inject GLM_API_KEY into hermes .env
3. Patch config.yaml:
   - model: zai/glm-4.7
   - memory:
       provider: hivemind
       env: MATRIX_HOMESERVER, MATRIX_USER_ID, MATRIX_ACCESS_TOKEN
4. exec sleep infinity (ready for `hermes chat`)

Environment Variables

Variable	Source	Purpose
`HERMES_MODEL`	docker-compose	`zai/glm-4.7`
`GLM_API_KEY`	.env.agents	ZAI API authentication
`GLM_BASE_URL`	docker-compose	`https://api.z.ai/api/coding/paas/v4`
`MATRIX_HOMESERVER`	docker-compose	`http://conduit:6167` (internal Docker DNS)
`MATRIX_USER_ID`	.env.agents	Per-agent: `@hermes-of-bob:localhost`
`MATRIX_ACCESS_TOKEN`	.env.agents	Per-agent: generated by scenario.py

Test Coverage Summary

Layer	Tests	What's tested	LLM needed?
Matrix protocol	14 (test_introducer.py)	Room lifecycle, membership, messaging, kicks, bans	No
Peer abstraction	13 (test_social_awareness.py)	Auto-join, context extraction, messaging, introductions	No
Ambient sparks	9 (test_sparks.py)	Summarization, spark detection, dedup, lifecycle	No (mocked call_llm)
Memory plugin	14 (test_hivemind_plugin.py)	Plugin lifecycle, prefetch, all hivemind_* tools, idempotency	No (mocked call_llm)
Full agent pipeline	Live scenario	hermes → GLM-4.7 → plugin → Matrix → response	Yes (ZAI)

50 automated tests (no LLM, ~8 seconds total) plus live agent scenarios with real model inference.

Verified: Live Agent Round Trip

The full pipeline was tested end-to-end on 2026-04-04 with GLM-4.7 via ZAI:

Step	Agent	Tool called	Result
1. Discover peer	hermes-of-bob	`hivemind_list_peers`	Found hermes-of-carol (introduced by hermes-of-alice)
2. Send message	hermes-of-bob	`hivemind_send_to_peer`	Audit request delivered to Carol
3. Read & respond	hermes-of-carol	`hivemind_list_peers` + `hivemind_check_messages` + `hivemind_send_to_peer`	Read Bob's request, sent proposal ($15-25K, 2 weeks)
4. Read response	hermes-of-bob	`hivemind_check_messages`	Summarized Carol's proposal accurately

All tool calls completed in <1 second. Messages encrypted via matrix-nio (Olm/Megolm) on Continuwuity.