What Is Database Indexing?
Database indexing is a data structure technique that dramatically improves query performance. Just as a book's index section makes it easy to find a specific topic, database indexes provide fast access to data in tables. Without an index, the database engine must scan every row in a table to answer a query (full table scan). For large tables, this leads to unacceptably slow performance.
Searching for a record in a table with millions of rows without an index is like searching for a topic in a thousand-page book without a table of contents. Indexing can reduce search time from seconds to milliseconds. In this guide, we will explore every aspect of database indexing in depth.
Indexing Fundamentals
How Does an Index Work?
A database index is stored as a separate data structure and contains pointers to specific columns in the original table. When the query engine evaluates a WHERE, JOIN, or ORDER BY clause, it first looks at the index to determine the physical location of relevant rows and directly accesses those locations.
The Cost of Indexing
Indexes are not free. Important trade-offs to consider:
- Disk space: Each index consumes additional disk space.
- Write performance: INSERT, UPDATE, and DELETE operations must also update indexes.
- Maintenance cost: Indexes may need periodic rebuilding or optimization.
- Memory usage: Frequently used indexes are kept in memory (buffer pool).
The right index in the right place can reduce query time by 1000x. The wrong index in the wrong place can severely degrade write performance. Indexing is an art; analyze your data access patterns and determine your strategy accordingly.
1. B-Tree Index
What Is B-Tree?
B-Tree (Balanced Tree) is the default and most widely used index structure in database systems. Nearly all relational databases, including PostgreSQL, MySQL (InnoDB), SQL Server, and Oracle, use B-Tree indexes.
B-Tree Structure
B-Tree is a balanced tree structure consisting of these components:
- Root node: The topmost point of the tree. Search begins here.
- Branch nodes: Intermediate nodes that contain key values directing the search path.
- Leaf nodes: The lowest level containing actual data pointers.
B-Tree Performance Characteristics
| Operation | Time Complexity | Description |
|---|---|---|
| Search | O(log n) | Proportional to tree depth |
| Insert | O(log n) | Leaf node insertion + rebalancing |
| Delete | O(log n) | Leaf node deletion + rebalancing |
| Range query | O(log n + k) | k: result set size |
When to Use B-Tree?
- Equality queries (WHERE id = 5)
- Range queries (WHERE age BETWEEN 20 AND 30)
- Sorting (ORDER BY created_at)
- Prefix searches (WHERE name LIKE 'Ali%')
- MIN, MAX aggregate functions
2. Hash Index
What Is Hash Index?
Hash index maps key values directly to storage locations using a hash function. Unlike B-Tree, it only works for equality queries; range queries, sorting, and prefix searches are not supported.
Hash vs B-Tree Comparison
| Feature | Hash Index | B-Tree Index |
|---|---|---|
| Equality query | O(1) - Very fast | O(log n) - Fast |
| Range query | Not supported | Supported |
| Sorting | Not supported | Supported |
| Disk usage | Less | More |
| Collision handling | Required | Not required |
Hash Index Use Cases
- Exact equality lookups (WHERE email = '[email protected]')
- Session ID lookups
- Cache key lookups
- Unique identifier (UUID) queries
3. Composite (Multi-Column) Index
What Is Composite Index?
A composite index combines multiple columns into a single index structure. It provides dramatic performance improvements for queries that filter or sort by multiple columns.
Column Order Matters
Column ordering in a composite index is critically important. According to the "leftmost prefix" rule, the index is only used for queries matching the column order from left to right:
- Given an index on (A, B, C):
- WHERE A = x uses the index
- WHERE A = x AND B = y uses the index
- WHERE A = x AND B = y AND C = z fully uses the index
- WHERE B = y does NOT use the index (A is skipped)
Composite Index Strategies
- High selectivity first: Place the most selective column at the beginning
- Equality before range: Place equality conditions before range conditions
- Covering index: Include all columns needed by the query to eliminate table access entirely
4. Query Optimization
The EXPLAIN Command
EXPLAIN shows how the database engine plans to execute a query. It is an indispensable tool for verifying index usage and identifying performance bottlenecks.
Reading EXPLAIN Output
Key metrics in EXPLAIN output:
- Seq Scan: Full table scan — no index used, generally a bad sign
- Index Scan: Data accessed via index — desired state
- Index Only Scan: Only the index is read, no table access — best case
- Bitmap Index Scan: Combination of multiple indexes
- Rows: Estimated number of rows to process
- Cost: Estimated cost of the query
Common Query Optimization Techniques
- Avoid SELECT *: Select only the columns you need.
- Optimize WHERE clauses: Do not use functions on indexed columns.
- Optimize JOIN order: Join from smaller to larger tables.
- JOIN over subquery: In most cases, JOIN outperforms subqueries.
- Use LIMIT: Always add LIMIT for large result sets.
- Prepared statements: Cache query plans for repeated queries.
5. Specialized Index Types
GIN (Generalized Inverted Index)
An index type optimized for JSON, arrays, and full-text search. Widely used in PostgreSQL for JSONB columns and tsvector searches.
GiST (Generalized Search Tree)
Used for geometric data, geographic queries, and range types. Critical for geographic database queries with PostGIS.
BRIN (Block Range Index)
A highly efficient, low-disk-usage index type for physically ordered data. Ideal for time-series data and log tables.
Indexing Best Practices
- Index frequently queried columns, not all of them
- B-Tree index is inefficient on low cardinality (few distinct values) columns
- Remove unused indexes (check with pg_stat_user_indexes)
- Monitor index size regularly
- Use partial indexes to index only rows matching specific conditions
Conclusion
Database indexing is one of the most critical determinants of application performance. With B-Tree, you achieve excellent performance for general-purpose queries; with hash indexes, instant results for equality lookups; with composite indexes, you accelerate multi-condition queries; and with EXPLAIN, you gain the ability to analyze query plans. The right indexing strategy ensures your database responds within milliseconds; the wrong strategy is an invisible enemy slowing down your system.