Local Multi-Database Architecture

InkCop uses a Local-First multi-database architecture. All academic data is stored on the user’s local device, with multiple specialized databases working together — delivering cloud-comparable retrieval performance while keeping your data completely private.

Design Principles

Data Sovereignty Is Yours

InkCop’s core design principle: your academic data never leaves your device (unless you explicitly call a cloud AI endpoint). This means:

Unpublished paper content, ideas, and research data: ✅ Local only
Private knowledge base and annotations: ✅ Local only
Conversation history and AI outputs: ✅ Local only
When calling cloud LLMs: only the message you send is transmitted, knowledge base content is never uploaded

Why Multiple Databases?

Academic knowledge management involves multiple data types, each with its own optimal storage and retrieval strategy:

Data Type	Storage Requirement	Optimal Database
Document metadata, settings, citation relationships	Structured, queryable	Relational database (libSQL)
Literature document objects, knowledge graph nodes	High-performance object store	Object database (ObjectBox)
Semantic vectors (text embeddings)	High-dimensional nearest-neighbor search	Vector index (HNSW)
Entity relationship graphs	Graph queries, path retrieval	Graph database (Kuzu)
Conversation history, threads	Lightweight structured	libSQL (built-in encryption)

Four Core Databases

1. libSQL — The Foundation for Structured Data

libSQL is the SQLite-compatible embedded relational database used by InkCop. It provides built-in encryption for sensitive data, so the application no longer relies on SQLCipher.

Stores:

Application settings and user configuration
Conversation history and thread records
Signal task queue (async processing scheduler)
Citation records and verification status
Notes and to-do items

Security Features:

Optional database-level AES-256 encryption (libSQL built-in)
Supports encrypted export archives
Cross-platform compatible (Windows/macOS/Linux)

2. ObjectBox — The High-Performance Engine for Literature Knowledge Bases

ObjectBox is an ultra-high-performance object database designed for desktop and mobile, with read/write speeds 5–15x faster than SQLite.

Stores:

All literature documents in the knowledge base
Document chunks and their metadata
Embedding vectors for each chunk
Knowledge base configuration and statistics cache

Built-in HNSW Vector Index: ObjectBox includes a built-in HNSW (Hierarchical Navigable Small World) vector index, enabling sub-millisecond approximate nearest neighbor (ANN) search across millions of document chunks. This is the core engine behind InkCop’s semantic search capability.

User query → Text vectorization → HNSW vector search → Top-K relevant chunks → Context injection → AI answer

3. Kuzu — Graph Database for the Knowledge Graph

Kuzu is a high-performance embedded graph database used to store and query entity relationship networks.

Stores:

Knowledge entities: person names, institutions, concepts, terminology, etc.
Inter-entity relations: published-in, cites, belongs-to, proposed-by, etc.
NLP concept nodes: semantic-level knowledge units
Communities: clusters of related concepts

Why Graph Retrieval Matters:

Traditional RAG (vector retrieval) can only find semantically similar passages, but cannot understand relationships between entities. Kuzu enables InkCop to answer questions like:

“Which papers in my knowledge base cite the BERT model and were published in Nature journals?”
“Find all core concepts related to ‘Transformer architecture’ and their connections”
“Which other knowledge base documents does this paper’s author appear in?“

4. Vector Database (HNSW Index) — Conversational Memory Retrieval

Beyond knowledge base document vectors, InkCop maintains a separate vector index for conversation history, enabling the “Global Memory” feature:

After each conversation ends, the Q&A pair is vectorized and stored
When a new conversation starts, semantically relevant past conversations are automatically retrieved
Relevant historical memories are injected into the system prompt for genuine long-term memory

How the Databases Work Together

Here’s the complete data flow for a single “semantic search”:

1. User asks question → Main agent parses intent
      ↓
2. Knowledge base search tools called:
   ├─ ObjectBox HNSW → Vector semantic similarity search (Top-K chunks)
   └─ Kuzu graph query → Entity relationship path retrieval
      ↓
3. Result fusion: vector scores + graph relationship weights → re-ranking
      ↓
4. SQLite → Query document metadata (title, author, citation info)
      ↓
5. Results injected into context → Cloud LLM generates answer
      ↓
6. AI answer returned with traceable citations

Throughout this process, the cloud LLM only receives your question + the retrieved relevant passages — it never has access to your complete knowledge base.

Local Model Support: Fully Offline Operation

InkCop supports running local models (such as Qwen, Llama, etc.) via the built-in llama.cpp service. With a local model enabled:

Inference runs on your local CPU/GPU
Zero network requests: all AI features work completely offline
Suitable for high-security lab environments or network-restricted areas

Encryption and Data Security

Protection Layer	Technology	Coverage
Database encryption	libSQL built-in AES-256	Conversation records, settings
Knowledge base export encryption	Custom-key encrypted archive	Exported knowledge base packages
Transport encryption	HTTPS/TLS	When calling cloud LLMs
Local file isolation	Workspace directory isolation	Multi-user/multi-project

Why This Matters for Academic Researchers

The academic community has extremely high data security requirements:

Academic integrity risk: Uploading unpublished research to cloud servers can create priority disputes
Confidentiality agreements: Research contracts often prohibit uploading research data to third parties
AI detection concerns: Some institutions treat “AI-assisted writing” as academic misconduct; local processing keeps a clear boundary
Data ownership: Researchers should have complete control over their academic work

InkCop’s local multi-database architecture answers these concerns at the technical level: your academic achievements remain in your hands, always.