InkCop uses a Local-First multi-database architecture. All academic data is stored on the user’s local device, with multiple specialized databases working together — delivering cloud-comparable retrieval performance while keeping your data completely private.

Design Principles

Data Sovereignty Is Yours

InkCop’s core design principle: your academic data never leaves your device (unless you explicitly call a cloud AI endpoint). This means:

  • Unpublished paper content, ideas, and research data: ✅ Local only
  • Private knowledge base and annotations: ✅ Local only
  • Conversation history and AI outputs: ✅ Local only
  • When calling cloud LLMs: only the message you send is transmitted, knowledge base content is never uploaded

Why Multiple Databases?

Academic knowledge management involves multiple data types, each with its own optimal storage and retrieval strategy:

Data TypeStorage RequirementOptimal Database
Document metadata, settings, citation relationshipsStructured, queryableRelational database (SQLite)
Literature document objects, knowledge graph nodesHigh-performance object storeObject database (ObjectBox)
Semantic vectors (text embeddings)High-dimensional nearest-neighbor searchVector index (HNSW)
Entity relationship graphsGraph queries, path retrievalGraph database (Kuzu)
Conversation history, threadsLightweight structuredSQLite (SQLCipher encrypted)

Four Core Databases

1. SQLite / SQLCipher — The Foundation for Structured Data

SQLite is the world’s most widely deployed embedded relational database. InkCop uses the SQLCipher encrypted variant for sensitive data.

Stores:

  • Application settings and user configuration
  • Conversation history and thread records
  • Signal task queue (async processing scheduler)
  • Citation records and verification status
  • Notes and to-do items

Security Features:

  • Optional database-level AES-256 encryption (SQLCipher)
  • Supports encrypted export archives
  • Cross-platform compatible (Windows/macOS/Linux)

2. ObjectBox — The High-Performance Engine for Literature Knowledge Bases

ObjectBox is an ultra-high-performance object database designed for desktop and mobile, with read/write speeds 5–15x faster than SQLite.

Stores:

  • All literature documents in the knowledge base
  • Document chunks and their metadata
  • Embedding vectors for each chunk
  • Knowledge base configuration and statistics cache

Built-in HNSW Vector Index: ObjectBox includes a built-in HNSW (Hierarchical Navigable Small World) vector index, enabling sub-millisecond approximate nearest neighbor (ANN) search across millions of document chunks. This is the core engine behind InkCop’s semantic search capability.

User query → Text vectorization → HNSW vector search → Top-K relevant chunks → Context injection → AI answer

3. Kuzu — Graph Database for the Knowledge Graph

Kuzu is a high-performance embedded graph database used to store and query entity relationship networks.

Stores:

  • Knowledge entities: person names, institutions, concepts, terminology, etc.
  • Inter-entity relations: published-in, cites, belongs-to, proposed-by, etc.
  • NLP concept nodes: semantic-level knowledge units
  • Communities: clusters of related concepts

Why Graph Retrieval Matters:

Traditional RAG (vector retrieval) can only find semantically similar passages, but cannot understand relationships between entities. Kuzu enables InkCop to answer questions like:

  • “Which papers in my knowledge base cite the BERT model and were published in Nature journals?”
  • “Find all core concepts related to ‘Transformer architecture’ and their connections”
  • “Which other knowledge base documents does this paper’s author appear in?“

4. Vector Database (HNSW Index) — Conversational Memory Retrieval

Beyond knowledge base document vectors, InkCop maintains a separate vector index for conversation history, enabling the “Global Memory” feature:

  • After each conversation ends, the Q&A pair is vectorized and stored
  • When a new conversation starts, semantically relevant past conversations are automatically retrieved
  • Relevant historical memories are injected into the system prompt for genuine long-term memory

How the Databases Work Together

Here’s the complete data flow for a single “semantic search”:

1. User asks question → Main agent parses intent

2. Knowledge base search tools called:
   ├─ ObjectBox HNSW → Vector semantic similarity search (Top-K chunks)
   └─ Kuzu graph query → Entity relationship path retrieval

3. Result fusion: vector scores + graph relationship weights → re-ranking

4. SQLite → Query document metadata (title, author, citation info)

5. Results injected into context → Cloud LLM generates answer

6. AI answer returned with traceable citations

Throughout this process, the cloud LLM only receives your question + the retrieved relevant passages — it never has access to your complete knowledge base.

Local Model Support: Fully Offline Operation

InkCop supports running local models (such as Qwen, Llama, etc.) via the built-in llama.cpp service. With a local model enabled:

  • Inference runs on your local CPU/GPU
  • Zero network requests: all AI features work completely offline
  • Suitable for high-security lab environments or network-restricted areas

Encryption and Data Security

Protection LayerTechnologyCoverage
Database encryptionSQLCipher AES-256Conversation records, settings
Knowledge base export encryptionCustom-key encrypted archiveExported knowledge base packages
Transport encryptionHTTPS/TLSWhen calling cloud LLMs
Local file isolationWorkspace directory isolationMulti-user/multi-project

Why This Matters for Academic Researchers

The academic community has extremely high data security requirements:

  • Academic integrity risk: Uploading unpublished research to cloud servers can create priority disputes
  • Confidentiality agreements: Research contracts often prohibit uploading research data to third parties
  • AI detection concerns: Some institutions treat “AI-assisted writing” as academic misconduct; local processing keeps a clear boundary
  • Data ownership: Researchers should have complete control over their academic work

InkCop’s local multi-database architecture answers these concerns at the technical level: your academic achievements remain in your hands, always.