OceanBase Unveils seekdb: Open Source AI-Native Hybrid Search Database for Multi-Modal RAG and AI Agents

AI applications rarely work with a single, clean dataset. Instead, they juggle user profiles, chat logs, JSON metadata, embeddings, and even spatial data—leading most teams to patch together an OLTP database, vector store, and search engine. OceanBase’s new open-source release, seekdb (under the Apache 2.0 license), aims to solve this fragmentation. It’s an AI-native hybrid search database that unifies relational data, vectors, text, JSON, and GIS in one engine, with built-in hybrid search and in-database AI workflows.

What is seekdb?

seekdb is the lightweight, embedded variant of the OceanBase engine—designed explicitly for AI applications rather than general-purpose distributed deployments.

Positioning & Deployment Modes

Embedded database: supported
Standalone database: supported
Distributed database: not supported (full OceanBase covers distributed use cases)

It’s MySQL-compatible (works with MySQL drivers and SQL syntax) and runs in embedded, client-server, or standalone modes.

Supported Data Models

All data types are unified in a single storage and indexing layer:

Relational data (with standard SQL)
Vector search
Full-text search
JSON data
Spatial GIS data

Hybrid Search: Core Feature

The flagship capability of seekdb is hybrid search—combining vector-based semantic retrieval, full-text keyword matching, and scalar filters in a single query and ranking step.

Implementation Details

seekdb uses the DBMS_HYBRID_SEARCH system package with two key entry points:

DBMS_HYBRID_SEARCH.SEARCH: Returns JSON results sorted by relevance.
DBMS_HYBRID_SEARCH.GET_SQL: Returns the raw SQL string used for execution.

Supported Workflows

Pure vector search
Pure full-text search
Combined hybrid search

It also supports:

Pushing relational filters/joins down to storage.
Reranking strategies: Weighted scores, reciprocal rank fusion, and pluggable LLM re-rankers.

Benefits for RAG & AI Agents

For RAG or agent memory, you can write one SQL query to:

Semantically match embeddings.
Exact-match product codes or proper nouns.
Filter by user/tenant scopes (relational constraints).

Vector & Full Text Engine Deep Dive

Vector Engine Features

Supports dense and sparse vectors.
Metrics: Manhattan, Euclidean, inner product, cosine distance.
Index types:
- In-memory: HNSW, HNSW_SQ, HNSW_BQ
- Disk-based: IVF, IVF_PQ
- Hybrid vector index: Auto-generates embeddings from raw text (no separate preprocessing pipeline).

Full Text Engine Features

Queries: Keyword, phrase, Boolean.
Ranking: BM25 relevance scoring.
Multiple tokenizer modes.
Key Integration: Full-text and vector indexes are first-class citizens—integrated with scalar/GIS indexes in the query planner, so no external orchestration is needed.

In-Database AI Functions

seekdb includes built-in AI functions that let you call models directly from SQL (no separate application layer):

AI_EMBED: Convert text to embeddings.
AI_COMPLETE: Generate text via chat/completion models.
AI_RERANK: Rerank candidate results.
AI_PROMPT: Assemble prompt templates + dynamic values into JSON for AI_COMPLETE.

Model management is handled via DBMS_AI_SERVICE—register external providers, set URLs, and configure keys all within the database.

Multimodal Data & Workload Support

seekdb handles multiple data modalities in one node:

Vectors, text, JSON, GIS, and relational data.

You can run queries that combine:

Semantic similarity (vectors).
Metadata filtering (JSON).
Spatial constraints (GIS).

Inherited from OceanBase:

ACID transactions.
Row-column hybrid storage.
Vectorized execution.

Note: Distributed scalability is reserved for the full OceanBase platform.