OceanBase Unveils seekdb: Open Source AI-Native Hybrid Search Database for Multi-Modal RAG and AI Agents
OceanBase Unveils seekdb: Open Source AI-Native Hybrid Search Database for Multi-Modal RAG and AI Agents
AI applications rarely work with a single, clean dataset. Instead, they juggle user profiles, chat logs, JSON metadata, embeddings, and even spatial data—leading most teams to patch together an OLTP database, vector store, and search engine. OceanBase’s new open-source release, seekdb (under the Apache 2.0 license), aims to solve this fragmentation. It’s an AI-native hybrid search database that unifies relational data, vectors, text, JSON, and GIS in one engine, with built-in hybrid search and in-database AI workflows.
What is seekdb?
seekdb is the lightweight, embedded variant of the OceanBase engine—designed explicitly for AI applications rather than general-purpose distributed deployments.
Positioning & Deployment Modes
- Embedded database:
supported - Standalone database:
supported - Distributed database:
not supported(full OceanBase covers distributed use cases)
It’s MySQL-compatible (works with MySQL drivers and SQL syntax) and runs in embedded, client-server, or standalone modes.
Supported Data Models
All data types are unified in a single storage and indexing layer:
- Relational data (with standard SQL)
- Vector search
- Full-text search
- JSON data
- Spatial GIS data
Hybrid Search: Core Feature
The flagship capability of seekdb is hybrid search—combining vector-based semantic retrieval, full-text keyword matching, and scalar filters in a single query and ranking step.
Implementation Details
seekdb uses the
DBMS_HYBRID_SEARCH system package with two key entry points:DBMS_HYBRID_SEARCH.SEARCH: Returns JSON results sorted by relevance.DBMS_HYBRID_SEARCH.GET_SQL: Returns the raw SQL string used for execution.
Supported Workflows
- Pure vector search
- Pure full-text search
- Combined hybrid search
It also supports:
- Pushing relational filters/joins down to storage.
- Reranking strategies: Weighted scores, reciprocal rank fusion, and pluggable LLM re-rankers.
Benefits for RAG & AI Agents
For RAG or agent memory, you can write one SQL query to:
- Semantically match embeddings.
- Exact-match product codes or proper nouns.
- Filter by user/tenant scopes (relational constraints).
Vector & Full Text Engine Deep Dive
Vector Engine Features
- Supports dense and sparse vectors.
- Metrics: Manhattan, Euclidean, inner product, cosine distance.
- Index types:
- In-memory:
HNSW,HNSW_SQ,HNSW_BQ - Disk-based:
IVF,IVF_PQ - Hybrid vector index: Auto-generates embeddings from raw text (no separate preprocessing pipeline).
Full Text Engine Features
- Queries: Keyword, phrase, Boolean.
- Ranking: BM25 relevance scoring.
- Multiple tokenizer modes.
- Key Integration: Full-text and vector indexes are first-class citizens—integrated with scalar/GIS indexes in the query planner, so no external orchestration is needed.
In-Database AI Functions
seekdb includes built-in AI functions that let you call models directly from SQL (no separate application layer):
AI_EMBED: Convert text to embeddings.AI_COMPLETE: Generate text via chat/completion models.AI_RERANK: Rerank candidate results.AI_PROMPT: Assemble prompt templates + dynamic values into JSON forAI_COMPLETE.
Model management is handled via
DBMS_AI_SERVICE—register external providers, set URLs, and configure keys all within the database.Multimodal Data & Workload Support
seekdb handles multiple data modalities in one node:
- Vectors, text, JSON, GIS, and relational data.
You can run queries that combine:
- Semantic similarity (vectors).
- Metadata filtering (JSON).
- Spatial constraints (GIS).
Inherited from OceanBase:
- ACID transactions.
- Row-column hybrid storage.
- Vectorized execution.
Note: Distributed scalability is reserved for the full OceanBase platform.
