Index Store¶
Overview¶
Sentinel DV uses an index store to serve MCP tools efficiently and deterministically from potentially massive artifact trees.
Storage Backend Recommendation (v1)¶
DuckDB is the recommended default:
- Simple to ship and operate
- Strong filtering and pagination
- Deterministic queries
- Excellent for analytical workloads (100k–100M rows)
- Built-in support for complex aggregations
v1.x supports DuckDB only.
DuckDB Schema (v1)¶
Table: runs¶
Columns:
CREATE TABLE runs (
run_id TEXT PRIMARY KEY, -- short form, e.g., r_...
run_id_full TEXT UNIQUE NOT NULL, -- full sha256 hex
suite TEXT NOT NULL,
created_at TEXT NOT NULL, -- RFC3339
status TEXT NOT NULL, -- pass|fail|mixed|unknown
ci_system TEXT,
ci_build_id TEXT,
ci_job_url TEXT,
artifact_manifest_hash TEXT,
index_built_at TEXT NOT NULL
);
Indexes:
CREATE INDEX idx_runs_suite_created_at ON runs(suite, created_at);
CREATE INDEX idx_runs_ci_build ON runs(ci_system, ci_build_id);
Table: tests¶
Columns:
CREATE TABLE tests (
test_id TEXT PRIMARY KEY, -- t_...
test_id_full TEXT UNIQUE NOT NULL,
run_id TEXT NOT NULL, -- REFERENCES runs(run_id)
framework TEXT NOT NULL,
name TEXT NOT NULL,
seed INTEGER,
status TEXT NOT NULL, -- pass|fail|error|skipped|unknown
duration_ms INTEGER,
sim_vendor TEXT,
sim_version TEXT,
dut_top TEXT,
created_at TEXT NOT NULL -- copied from run for easier filtering
);
Indexes:
CREATE INDEX idx_tests_run_status ON tests(run_id, status);
CREATE INDEX idx_tests_name ON tests(name);
CREATE INDEX idx_tests_framework ON tests(framework);
CREATE INDEX idx_tests_created_at ON tests(created_at);
Table: failures¶
Columns:
CREATE TABLE failures (
failure_id TEXT PRIMARY KEY, -- f_...
failure_id_full TEXT UNIQUE NOT NULL,
test_id TEXT NOT NULL, -- REFERENCES tests(test_id)
run_id TEXT NOT NULL, -- REFERENCES runs(run_id)
severity TEXT NOT NULL,
category TEXT NOT NULL,
summary TEXT NOT NULL,
message TEXT NOT NULL, -- bounded+redacted
time_ns BIGINT,
phase TEXT,
component TEXT,
tags_json TEXT NOT NULL, -- JSON array string, bounded
tags_flat TEXT NOT NULL, -- space-delimited for LIKE queries
signature_id TEXT -- s_... precomputed clustering id
);
Indexes:
CREATE INDEX idx_failures_run_category ON failures(run_id, category);
CREATE INDEX idx_failures_test ON failures(test_id);
CREATE INDEX idx_failures_signature ON failures(signature_id);
CREATE INDEX idx_failures_time ON failures(time_ns);
Table: assertions¶
Columns:
CREATE TABLE assertions (
assertion_id TEXT PRIMARY KEY,
assertion_id_full TEXT UNIQUE NOT NULL,
language TEXT NOT NULL,
name TEXT NOT NULL,
scope TEXT NOT NULL,
file TEXT NOT NULL, -- relative
line INTEGER NOT NULL,
intent_protocol TEXT,
intent_requirement TEXT,
signals_json TEXT NOT NULL -- JSON array string, bounded
);
Indexes:
CREATE INDEX idx_assertions_name ON assertions(name);
CREATE INDEX idx_assertions_scope ON assertions(scope);
CREATE INDEX idx_assertions_file ON assertions(file);
Table: assertion_failures¶
Columns:
CREATE TABLE assertion_failures (
id INTEGER PRIMARY KEY,
assertion_id TEXT NOT NULL, -- REFERENCES assertions(assertion_id)
test_id TEXT NOT NULL, -- REFERENCES tests(test_id)
run_id TEXT NOT NULL, -- REFERENCES runs(run_id)
time_ns BIGINT,
message TEXT NOT NULL -- bounded+redacted
);
Indexes:
CREATE INDEX idx_asfail_run ON assertion_failures(run_id);
CREATE INDEX idx_asfail_test ON assertion_failures(test_id);
CREATE INDEX idx_asfail_assertion ON assertion_failures(assertion_id);
Table: coverage_summaries¶
Columns:
CREATE TABLE coverage_summaries (
id INTEGER PRIMARY KEY,
run_id TEXT NOT NULL, -- REFERENCES runs(run_id)
test_id TEXT, -- REFERENCES tests(test_id), nullable
kind TEXT NOT NULL,
metrics_json TEXT NOT NULL, -- bounded array of CoverageMetric
evidence_json TEXT
);
Indexes:
CREATE INDEX idx_cov_run_kind ON coverage_summaries(run_id, kind);
CREATE INDEX idx_cov_test_kind ON coverage_summaries(test_id, kind);
Table: topologies¶
Columns:
CREATE TABLE topologies (
test_id TEXT PRIMARY KEY, -- REFERENCES tests(test_id)
topology_json TEXT NOT NULL -- TestTopology, bounded
);
Table: evidence¶
Columns (if normalized evidence desired):
CREATE TABLE evidence (
id INTEGER PRIMARY KEY,
owner_kind TEXT NOT NULL, -- test|failure|assertion_failure|coverage|diff
owner_id TEXT NOT NULL,
kind TEXT NOT NULL,
path TEXT NOT NULL,
start_line INTEGER,
end_line INTEGER,
start_time_ns BIGINT,
end_time_ns BIGINT,
extract TEXT, -- bounded+redacted
hash TEXT
);
Indexes:
Note: Sentinel DV now writes normalized evidence rows for failures, assertion failures, and coverage summaries. Tool-level include_evidence flags read from this table with bounded payloads.
Query Semantics (How Tools Map to SQL)¶
Common Constraints¶
- Always apply page/page_size as:
- Always return
total_itemscomputed via a separateCOUNT(*)query with the same filters - Sorting must be stable: if sort field ties, add secondary sort on primary key (
test_id,failure_id) to keep deterministic ordering
tests.list¶
Filters map to WHERE clauses:
run_id = ?framework = ?status = ?name LIKE %?%(escape wildcards)seed = ?
failures.list¶
Filters map to WHERE clauses:
test_id = ?orrun_id = ?category = ?severity = ?component LIKE %?%- Tags:
- For v1, tags stored as JSON string; simplest deterministic approach:
- precompute a
tags_flat TEXTcolumn (space-delimited normalized tags) and query withLIKE - or use DuckDB's JSON functions:
json_extract
- precompute a
Recommended: Store both: - tags_json - tags_flat (lowercase, space delimited, bounded)
Then tags_any → OR of tags_flat LIKE %tag%.
regressions.summary¶
Compute:
- pass rate: count tests by status across runs in window
- top signatures: group failures by
signature_idand count distinct tests (or failures)
Recommended: define clearly: - count = number of occurrences (failures) - optionally affected_tests = count distinct test_id
Use deterministic top-N ordering: - order by affected_tests desc, then count desc, then signature_id asc
runs.diff¶
Deterministic diff components:
- Test status changes:
-
join tests by
(framework, name, seed)(or by stable test GUID if available) -
Signature changes:
- signatures in
comparenot inbase→ "new" -
in
basenot incompare→ "resolved" -
Coverage deltas:
-
compare coverage metrics by
(kind, name, scope)for shared keys -
Config deltas:
- only if config snapshot is indexed (optional v1.1)
Index Build Semantics¶
Indexing Phases¶
- Discover runs and their artifact roots
- Parse adapters producing normalized records
- Compute IDs and signatures
- Apply redaction and bounding
- Write to DuckDB in a transaction
- Record
index_built_atand schema version
Incremental Indexing (optional)¶
- Use
artifact_manifest_hashchanges to detect whether a run needs reindex - For CI-backed runs, prefer
ci_build_id+suiteas key
Integrity and Reproducibility Fields (recommended)¶
Include in index metadata table:
Table: meta¶
Suggested keys:
schema_versionindexer_versionbuilt_atconfig_hash(sha256 of normalized config affecting indexing)
This supports "same inputs → same outputs" audits.
Tight Coupling Decisions (explicit)¶
- IDs depend only on normalized semantic fields, not on indexing time
- Signature IDs depend only on category + signature-normalized summary (+ optional tags)
- Taxonomy rules are ordered and versioned; changes should increment MINOR or MAJOR depending on whether it changes enum behavior significantly
Implementation Reference¶
See sentinel_dv/indexing/store.py for the canonical DuckDB implementation.