Indexing
Kodit’s indexing system allows you to create searchable indexes of your codebases, enabling AI assistants to find and reference relevant code snippets. This page explains how indexing works, what sources are supported, and how to use the indexing features.
How Indexing Works
Kodit’s indexing process consists of several stages:
- Source Creation: Kodit clones or copies your source code to a local working directory
- File Processing: Files are scanned and metadata is extracted (timestamps, authors, etc.)
- Snippet Extraction: Code is parsed using tree-sitter to extract meaningful snippets (functions, classes, methods)
- Index Building: Multiple search indexes are created:
- BM25 Index: For keyword-based search
- Semantic Code Index: For code similarity search using embeddings
- Enrichment: AI-powered enrichment of snippets for better search results
- Semantic Text Index: For natural language search using embeddings
Supported Source Types
Kodit supports two main types of sources:
Git Repositories
Kodit can index any Git repository accessible via standard Git protocols:
- HTTPS: Public repositories and private repositories with authentication
- SSH: Using SSH keys for authentication
- Git Protocol: For public repositories
Local Directories
Kodit can index local directories on your filesystem:
- Absolute paths:
/path/to/your/code
- Relative paths:
./my-project
- Home directory expansion:
~/projects/my-app
- File URIs:
file:///path/to/your/code
Basic Usage
Indexing a Source
To index a source, use the kodit index
command followed by the source location:
# Index a local directory
kodit index /path/to/your/code
# Index a public Git repository
kodit index https://github.com/pydantic/pydantic
# Index a private Git repository (requires authentication)
kodit index https://github.com/username/private-repo
# Index using SSH
kodit index [email protected]:username/repo.git
Listing Indexes
To see all your indexed sources:
kodit index
This will display a table showing:
- Index ID
- Creation and update timestamps
- Source URI
- Number of snippets extracted
Git Protocol Support
HTTPS Authentication
For private repositories, you can authenticate using:
Personal Access Token (GitHub, GitLab, etc.):
kodit index https://username:[email protected]/username/repo.git
Username/Password (if supported by your Git provider):
kodit index https://username:[email protected]/username/repo.git
SSH Authentication
For SSH-based repositories:
SSH Key Authentication:
kodit index [email protected]:username/repo.git
Ensure your SSH key is properly configured in your SSH agent or
~/.ssh/config
.SSH with Custom Port:
kodit index ssh://[email protected]:2222/username/repo.git
Git Providers
Kodit works with any Git provider that supports standard Git protocols:
- GitHub:
https://github.com/username/repo.git
- GitLab:
https://gitlab.com/username/repo.git
- Bitbucket:
https://bitbucket.org/username/repo.git
- Azure DevOps:
https://dev.azure.com/organization/project/_git/repo
- Self-hosted Git servers: Any Git server supporting HTTP/HTTPS or SSH
Examples of Use
Index a Public Azure DevOps Repository
kodit index https://[email protected]/winderai/public-test/_git/simple-ddd-brewing-demo
Indexing a Private Azure DevOps Repository
If you’re accessing Azure DevOps from your local machine and have the Git credential helper you should be able to clone the repository as usual (obviously you won’t be able to clone this because it is private):
kodit index https://[email protected]/winderai/private-test/_git/private-test
You can also use a Personal Access Token (PAT):
kodit index https://phil:[email protected]/winderai/private-test/_git/private-test
File Processing and Filtering
Ignored Files
Kodit respects standard ignore patterns:
.gitignore
: Standard Git ignore patterns.noindex
: Custom ignore patterns for Kodit (uses gitignore syntax)
Supported File Types
Kodit automatically detects and processes files based on their extensions:
Language | Extensions |
---|---|
Python | .py |
JavaScript | .js , .jsx |
TypeScript | .ts , .tsx |
Go | .go |
C# | .cs |
Snippet Extraction
Kodit uses tree-sitter to intelligently extract code snippets:
- Functions and Methods: Complete function definitions with their bodies
- Classes: Class definitions and their methods
- Imports: Import statements for context
- Dependencies: Ancestor classes and functions that the snippet depends on
Configuration
Clone Directory
By default, Kodit stores cloned repositories in ~/.kodit/clones/
. You can configure this using the DATA_DIR
environment variable:
export DATA_DIR=/custom/path/to/kodit/data
Database Configuration
Kodit uses SQLite by default, but supports PostgreSQL with VectorChord for better performance:
# SQLite (default)
DB_URL=sqlite+aiosqlite:///path/to/kodit.db
# PostgreSQL with VectorChord
DB_URL=postgresql+asyncpg://user:password@localhost:5432/kodit
DEFAULT_SEARCH_PROVIDER=vectorchord
AI Provider Configuration
For semantic search and enrichment, configure your AI provider:
# OpenAI
DEFAULT_ENDPOINT_TYPE=openai
DEFAULT_ENDPOINT_BASE_URL=https://api.openai.com/v1
DEFAULT_ENDPOINT_API_KEY=sk-your-api-key
# Or use local models (slower but private)
# No configuration needed - uses local models by default
Advanced Features
Re-indexing Sources
Future feature!
Progress Monitoring
Kodit shows progress during indexing operations:
- File processing progress
- Snippet extraction progress
- Index building progress (BM25, embeddings)
Error Handling
Common issues and solutions:
- Authentication Errors: Ensure your credentials are correct
- Network Issues: Check your internet connection and firewall settings
- Permission Errors: Ensure you have read access to the source
- Unsupported Files: Kodit will skip unsupported file types automatically
Privacy and Security
Local Processing
- All code is processed locally by default
- No code is sent to external services unless you configure AI providers
- Cloned repositories are stored locally in your data directory
Ignore Patterns
Kodit respects privacy by honoring:
.gitignore
patterns.noindex
files for custom exclusions- Hidden files and directories (starting with
.
)
Authentication
- SSH keys and tokens are handled by your system’s Git configuration
- Kodit doesn’t store or transmit credentials
- Use environment variables for sensitive configuration
Troubleshooting
Common Issues
- “Failed to clone repository”: Check your Git credentials and network connection
- “Unsupported source”: Ensure the path or URL is valid and accessible
- “No snippets found”: Check if the source contains supported file types
- “Permission denied”: Ensure you have read access to the source
Checking Index Status
To verify your indexes are working correctly:
# List all indexes
kodit index
# Test search functionality
kodit search text "example function"