What Is Aegis AI

Aegis AI is the world's first AI security agent. It runs on your Mac, Windows PC, or Linux machine and senses from all your cameras.

Not "motion detected." Your cameras finally understand what's happening.

Built on Vision Language Models, Large Language Models, and Voice Language Models — Aegis describes scenes, recognizes patterns, and learns your environment. No cloud required. No subscriptions. Your data stays home.

How It Works

Aegis connects multiple AI models into a unified intelligence pipeline that transforms raw camera footage into actionable understanding:

Layer	What It Does	Examples
Vision Model (VLM)	Watches camera frames, describes the scene in natural language	SmolVLM2, Qwen 2.5 VL, Gemma 3, MiniCPM-V, LLaVA
Language Model (LLM)	Reasons about descriptions, answers questions, makes decisions	Built-in local models, OpenAI, Anthropic, Google
Agent	Maintains memory, sends alerts, executes tools, adapts to your routines	Configurable personality with soul, voice, memory, and toolbox
Skills	Modular ML capabilities — detection, segmentation, depth estimation	YOLO Detection, SAM2 Segmentation, Depth Anything v2

The Pipeline in Action

Here's what happens every time a camera records a clip:

Frame extraction — Aegis pulls frames from the clip at your configured rate (0.1–5 fps)
VLM analysis — The active vision model sees each frame and writes a natural-language description: "Person approaching front door carrying a large brown box. Two vehicles parked in the driveway."
Event evaluation — The LLM compares the description against your configured event handlers using semantic understanding — not simple keyword matching
Alert delivery — If a match is found, Aegis sends a notification through your configured messaging channels (Telegram, Discord, or Slack) with the AI description and a snapshot
Memory update — The agent stores relevant observations in its memory, building long-term context about your environment
Timeline indexing — The clip, its AI description, and metadata are persisted to the timeline for future search and review

This entire pipeline runs continuously and automatically. You can also intervene at any point — ask the agent questions in natural language, search the timeline, or adjust event handlers — and the agent responds with full awareness of everything it has observed.

What Makes This Different From Traditional Security Cameras

Traditional security cameras record video and detect motion. That's it. You get a wall of clips with no context, and you're responsible for watching all of them.

Aegis AI is fundamentally different:

Traditional Camera	Aegis AI
"Motion detected at 3:42 PM"	"Person in a blue jacket walking up the driveway carrying a large brown box — likely a package delivery"
No understanding of context	Remembers your family members, daily routines, and known vehicles
Alert fatigue from constant false positives	Natural-language event rules that only fire when something genuinely matches
You have to watch every clip manually	Ask "what happened last night?" and get a summary
Footage stored, never analyzed	Every clip analyzed, described, indexed, and searchable

Cloud or Local — You Choose

Local-First Privacy

Browse and download VLMs directly from HuggingFace. Everything runs on your hardware — fully offline, zero API costs. Your camera footage, AI analysis, agent memory, and chat history never leave your machine.

The built-in AI Engine powers local inference with hardware-specific optimization:

Apple Silicon — Metal GPU acceleration for near-realtime speeds
NVIDIA GPUs — CUDA acceleration for fast, efficient processing
CPU fallback — optimized inference for systems without a dedicated GPU

Cloud Providers

Bring your own OpenAI, Google, or Anthropic API key for maximum speed and quality. Cloud providers are optional — you pay the provider directly, and Aegis includes real-time cost estimation so you always know what you're spending.

Key Features at a Glance

Feature	Description
Multi-camera monitoring	Blink, Ring, RTSP, ONVIF, webcam, and mobile — all in one grid
AI-powered video analysis	Every clip analyzed by a VLM, producing natural-language descriptions
Natural-language alerts	Define event handlers in plain English — the LLM evaluates matches semantically
Agent with memory	Persistent memory that learns your routines, family members, and environment
Multi-channel messaging	Receive alerts and hold conversations via Telegram, Discord, or Slack
Voice interaction	Text-to-speech with multiple local AI models, plus push-to-talk input
Skills marketplace	Install modular AI capabilities — object detection, depth estimation, segmentation
Model training	Fine-tune custom YOLO models on your own camera data
AI video generation	Create videos and images on demand using Google Veo, Gemini, or OpenAI
Storage management	Configurable retention policies, storage modes, and custom media paths
Cross-platform	macOS (Apple Silicon + Intel), Windows (x64), Linux (AppImage + .deb)
Fully offline	Download local models and run without internet — no subscriptions, no cloud accounts

Up and Running in Minutes

→ Download from sharpai.org
→ Getting Started guide →

What Is Aegis AI

Getting Started

Cameras

AI Setup

Agent

Skills

Help

How It Works

The Pipeline in Action

What Makes This Different From Traditional Security Cameras

Cloud or Local — You Choose

Local-First Privacy

Cloud Providers

Key Features at a Glance

Up and Running in Minutes

Getting Started

Cameras

AI Setup

Agent

Skills

Help

How It Works​

The Pipeline in Action​

What Makes This Different From Traditional Security Cameras​

Cloud or Local — You Choose​

Local-First Privacy​

Cloud Providers​

Key Features at a Glance​

Up and Running in Minutes​

How It Works

The Pipeline in Action

What Makes This Different From Traditional Security Cameras

Cloud or Local — You Choose

Local-First Privacy

Cloud Providers

Key Features at a Glance

Up and Running in Minutes