Skip to content

feat: add Exa AI-powered search skill#74

Open
tgonzalezc5 wants to merge 1 commit intolsdefine:mainfrom
tgonzalezc5:feat/exa-search-skill
Open

feat: add Exa AI-powered search skill#74
tgonzalezc5 wants to merge 1 commit intolsdefine:mainfrom
tgonzalezc5:feat/exa-search-skill

Conversation

@tgonzalezc5
Copy link
Copy Markdown

Summary

Adds a self-contained skill (not a new atomic tool) for semantic web search via Exa. The skill lives entirely under memory/ alongside existing skills like keychain.py, skill_search/, and procmem_scanner.py, so it respects the project's "don't preload skills — evolve them" philosophy: the 9 atomic tools + code_run stay untouched, and the agent opts in via from exa_search import search only when a task calls for it.

Why Exa? web_scan / web_execute_js are great for browser-driven scraping, but for "research this topic" / "find similar pages" / "filter by category or date", a semantic API returns structured, deduplicated results in one call instead of multi-turn Google scraping.

What's in the skill

  • memory/exa_search.py — wraps exa-py with a typed ExaResult dataclass and three functions:
    • search(query, ...) — search + content retrieval
    • find_similar(url, ...) — semantic neighbors
    • get_contents(urls, ...) — fetch page text for known URLs
    • Search types: auto (default), neural, fast, instant, deep, deep-lite, deep-reasoning (no deprecated keyword)
    • Filters: category, include_domains, exclude_domains, include_text, exclude_text, date ranges
    • Content flags compose (text + highlights + summary can all be passed together)
    • snippet cascades through highlights → summary → text so callers don't have to null-check three fields
    • Key loads from EXA_API_KEY env var, falling back to keychain.keys.exa_api_key
    • Sets x-exa-integration: generic-agent header for usage attribution
  • memory/exa_search_sop.md — SOP doc matching the style of other skill SOPs: triggers/禁用, 最简调用, API 签名, 典型场景, 避坑
  • tests/test_exa_search.py — 22 unit tests covering snippet fallback, kwargs wiring, client singleton, integration header, and error paths (all mocked; no live API calls)
  • .gitignore — whitelist entries for the two new memory/ files (following the project's pattern)

Usage

import sys; sys.path.append('../memory')
from exa_search import search

results = search(
    "latest research on retrieval augmented generation",
    category="research paper",
    num_results=10,
    start_published_date="2025-01-01T00:00:00Z",
)
for r in results:
    print(f"- {r.title}  ({r.url})")
    print(f"  {r.snippet[:200]}")

Files changed

  • memory/exa_search.py (new, 170 lines) — skill implementation
  • memory/exa_search_sop.md (new) — SOP doc
  • tests/test_exa_search.py (new, 22 tests) — unit tests
  • .gitignore — whitelist the two new memory files

Test plan

  • python -m pytest tests/test_exa_search.py -v → 22 passed
  • Snippet fallback: highlights → summary → text → empty (7 cases)
  • _convert: full result + defaults when fields are missing
  • Client construction: env var, singleton, missing key raises, missing SDK raises
  • search() kwargs: defaults, all filters forwarded, content types combine, disabled highlights, empty results, missing .results attr
  • find_similar() and get_contents() wiring
  • Existing test suite unaffected (3 pre-existing MiniMax failures reproduce on upstream/main independently)
  • Live API smoke test — requires EXA_API_KEY; not run in CI, but reproducible via python memory/exa_search.py "agent frameworks 2025" 5

Notes

  • exa-py>=2.0.0 is imported lazily inside _get_client(), so the skill is free to stay out of the core install flow — users who don't need Exa never see the dependency. Install on demand: pip install exa-py. Happy to add it to a central requirements list if one is introduced later.
  • The skill is opt-in and does not touch ga.py, agent_loop.py, llmcore.py, or the 9-tool atomic set.

Adds a self-contained search skill at memory/exa_search.py + a matching
memory/exa_search_sop.md. The skill wraps exa-py with sensible defaults
and a typed ExaResult, letting agents do semantic web search without
pre-baking a new atomic tool.

- exa_search.search(): search + content retrieval with category / domain
  / text / date filters; search types: auto, neural, fast, instant,
  deep, deep-lite, deep-reasoning (no deprecated 'keyword')
- find_similar() and get_contents() helpers
- ExaResult with snippet that cascades highlights -> summary -> text
- API key loads from EXA_API_KEY env var, falling back to keychain
- Sets x-exa-integration header for usage attribution
- 22 unit tests covering snippet fallback, kwargs wiring, client
  construction, and error paths (mock exa_py; no live API calls)
@LEE-Kyungjae
Copy link
Copy Markdown

Repeated use of a specific search API (such as EXA) can continuously influence an agent’s decision-making through that service’s data coverage and ranking logic, leading to the accumulation of bias over time. Therefore, rather than relying on a single service, it is more desirable to leverage multiple sources or design a structure that allows direct control over potential biases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants