Indexing

Kodit indexes Git repositories to create searchable code databases for AI assistants. The system extracts code snippets with semantic understanding and builds multiple search indexes for different query types.

How Indexing Works

Kodit transforms Git repositories through a 5-stage pipeline:

  1. Clone Repository: Downloads the Git repository locally
  2. Scan Repository: Extracts Git metadata (commits, branches, tags)
  3. Extract Snippets: Uses Tree-sitter parsing to extract functions, classes, and methods with dependencies
  4. Build Indexes: Creates BM25 (keyword), code embeddings (semantic), and text embeddings (natural language) indexes
  5. AI Enrichment: Generates summaries using LLM providers for enhanced search

Supported Sources

Kodit indexes Git repositories via:

  • HTTPS: Public and private repositories with authentication
  • SSH: Using SSH keys
  • Git Protocol: For public repositories

Supports GitHub, GitLab, Bitbucket, Azure DevOps, and self-hosted Git servers.

REST API

Kodit provides a REST API that allows you to programmatically manage indexes and search code snippets. The API is automatically available when you start the Kodit server and follows the JSON:API specification for consistent request/response formats.

Please see the API documentation for a full description of the API. You can also browse to the live API documentation by visiting /docs.

Authentication

# HTTPS with token
https://username:[email protected]/username/repo.git

# SSH (ensure SSH key is configured)
[email protected]:username/repo.git

Supported Languages

20+ programming languages with Tree-sitter parsing:

LanguageExtensionsKey Features
Python.py, .pyw, .pyxDecorators, async functions, inheritance
JavaScript/TypeScript.js, .jsx, .ts, .tsxArrow functions, ES6 modules, types
Java.javaAnnotations, generics, inheritance
Go.goInterfaces, struct methods, packages
Rust.rsTraits, ownership patterns, macros
C/C++.c, .h, .cpp, .hppFunction pointers, templates
C#.csProperties, LINQ, async patterns
HTML/CSS.html, .css, .scssSemantic elements, responsive patterns

Advanced Features

Intelligent Re-indexing

  • Git commit tracking for change detection
  • Only processes new/modified commits
  • Bulk operations for performance
  • Concurrent snippet extraction

Task Queue System

  • User-initiated (high priority)
  • Background sync (low priority)
  • Repository and commit operations
  • Automatic retry with backoff

Auto-Sync Server

  • 30-minute default sync intervals
  • Background processing
  • Incremental updates only