MCP Catalog Now Available: Simplified Discovery, Configuration, and AI Observability in Tetrate Agent Router Service

Learn more

Vector Embeddings Explained: From Basics to RAG Applications

Vector embeddings have become the foundation of modern AI applications, transforming how machines understand and process human language. These mathematical representations convert words, sentences, and documents into numerical vectors that capture semantic meaning, enabling computers to measure similarity, find relevant information, and power sophisticated search systems. Understanding vector embeddings is essential for anyone building AI applications, from simple semantic search to complex retrieval-augmented generation (RAG) systems.

What Are Vector Embeddings?

Vector embeddings are numerical representations of data—typically text, but also images, audio, or other content—that capture semantic meaning in a format computers can process mathematically. Unlike traditional keyword-based approaches that treat words as discrete symbols, embeddings represent concepts as points in a high-dimensional space where semantically similar items are positioned close together.

At their core, embeddings are arrays of floating-point numbers. For example, the word “cat” might be represented as a vector like [0.2, -0.5, 0.8, 0.1, …] with hundreds or thousands of dimensions. The specific numbers aren’t meaningful in isolation, but their relationships to other vectors encode semantic information. Words with similar meanings—like “cat” and “kitten”—will have vectors that point in similar directions in this mathematical space.

The power of embeddings lies in their ability to capture nuanced relationships. In a well-trained embedding space, you can perform mathematical operations that reflect semantic relationships. The classic example is: vector(“king”) - vector(“man”) + vector(“woman”) ≈ vector(“queen”). This demonstrates how embeddings encode not just individual word meanings but also the relationships between concepts.

Embeddings solve a fundamental problem in natural language processing: computers need numbers to perform calculations, but human language is symbolic and contextual. Traditional approaches like one-hot encoding (representing each word as a unique position in a massive sparse vector) fail to capture meaning or relationships. Embeddings compress semantic information into dense vectors where every dimension contributes to representing meaning.

Modern embedding models can represent not just individual words but entire sentences, paragraphs, or documents as single vectors. This enables applications to compare the semantic similarity of any text, regardless of length or specific wording. Two sentences that express the same idea using completely different words will have similar embedding vectors, making embeddings ideal for tasks like semantic search, document clustering, and question answering.

How Embedding Models Transform Text to Vectors

Embedding models are neural networks trained to convert text into vector representations that capture semantic meaning. The transformation process involves multiple stages of processing that progressively extract and encode linguistic features into numerical form.

The process begins with tokenization, where input text is split into smaller units called tokens. These might be words, subwords, or even individual characters, depending on the model’s design. Modern models typically use subword tokenization, which breaks text into meaningful chunks that balance vocabulary size with the ability to handle rare or unknown words. For example, “unhappiness” might be tokenized as [“un”, “happiness”] or [“unhappy”, “ness”].

After tokenization, each token is mapped to an initial embedding vector through a lookup table. These initial embeddings are learned during training and serve as the starting point for deeper processing. The model then passes these token embeddings through multiple layers of neural network transformations, typically using architectures like transformers that can capture relationships between tokens regardless of their distance in the text.

Contextual Processing

The key innovation in modern embedding models is their ability to generate contextual embeddings. Unlike older approaches where each word had a single fixed vector, contemporary models produce different embeddings for the same word depending on its context. The word “bank” will have different embeddings in “river bank” versus “savings bank” because the model processes the entire input sequence to determine meaning.

Transformer-based models achieve this through attention mechanisms, which allow each token to “attend to” other tokens in the sequence. When processing the word “bank” in “I deposited money at the bank,” the model gives high attention weights to words like “deposited” and “money,” which help disambiguate the financial meaning. This contextual information is integrated into the final embedding vector.

Pooling Strategies

For sentence or document embeddings, models must combine the individual token embeddings into a single vector. Common strategies include mean pooling (averaging all token vectors), max pooling (taking the maximum value across each dimension), or using special tokens like [CLS] that are specifically trained to represent the entire sequence. The choice of pooling strategy affects how well the final embedding captures the overall meaning versus specific details.

The training process teaches embedding models to position semantically similar texts close together in vector space. Models are typically trained on large text corpora using objectives like predicting masked words, distinguishing similar from dissimilar sentence pairs, or maximizing similarity between paraphrases. Through this training, the model learns to encode semantic meaning in a way that generalizes to new, unseen text.

Understanding Embedding Dimensions and Vector Space

Deploy production AI agents with Tetrate Agent Router Service. Enterprise-grade infrastructure with $5 free credit.

Try TARS Free

The dimensionality of embedding vectors—the number of numerical values in each vector—is a critical design choice that affects both the expressiveness of the representation and the computational resources required to work with embeddings.

Embedding dimensions typically range from 128 to 4096, with common sizes being 384, 768, and 1536 dimensions. Each dimension can be thought of as a learned feature that captures some aspect of semantic meaning. While we can’t interpret individual dimensions in human terms, collectively they encode information about topics, sentiment, grammatical structure, and countless other linguistic properties.

The Dimensionality Trade-off

Higher-dimensional embeddings can capture more nuanced semantic information and typically achieve better performance on downstream tasks. A 1536-dimensional embedding has more capacity to distinguish subtle differences in meaning than a 384-dimensional one. However, higher dimensions come with significant costs: they require more memory to store, more computation to process, and more training data to learn effectively.

The relationship between dimensions and performance isn’t linear. Doubling the dimensions doesn’t double the quality of the embeddings. In many applications, there’s a point of diminishing returns where additional dimensions provide minimal improvement. For example, a 768-dimensional embedding might capture 95% of the semantic information that a 1536-dimensional embedding captures, while using half the resources.

Vector Space Geometry

Embedding vectors exist in a high-dimensional space with geometric properties that reflect semantic relationships. Semantically similar texts cluster together in this space, while dissimilar texts are far apart. The geometry of this space is learned during training and reflects the patterns in the training data.

In a well-structured embedding space, you can observe meaningful geometric patterns. Synonyms form tight clusters, antonyms often lie in opposite directions from a central point, and related concepts form neighborhoods. For instance, all animal names might cluster in one region of the space, with subclusters for mammals, birds, and reptiles.

The curse of dimensionality affects embedding spaces: as dimensions increase, the volume of the space grows exponentially, and points become increasingly sparse. This means that in very high-dimensional spaces, most vectors are roughly equidistant from each other, which can make similarity comparisons less meaningful. Well-trained embedding models mitigate this by learning to use the available dimensions efficiently, concentrating semantic information in ways that maintain meaningful distance relationships.

Practical Implications

When working with embeddings, the choice of dimensionality affects your entire system architecture. Higher dimensions require more storage in vector databases, slower similarity searches, and more bandwidth for data transfer. For applications processing millions of documents, the difference between 384 and 1536 dimensions can mean the difference between a system that fits in memory and one that requires distributed infrastructure.

Dimension reduction techniques like principal component analysis (PCA) can compress high-dimensional embeddings to lower dimensions while preserving most of the semantic information. This can be useful for reducing storage and computation costs, though it typically comes with some loss in accuracy. The optimal dimensionality depends on your specific use case, data characteristics, and performance requirements.

Cosine Similarity and Distance Metrics

Measuring the similarity or distance between embedding vectors is fundamental to nearly every application that uses embeddings. Different metrics capture different notions of similarity, and choosing the right one affects the quality of your results.

Cosine Similarity

Cosine similarity is the most widely used metric for comparing embeddings. It measures the cosine of the angle between two vectors, producing a value between -1 and 1, where 1 indicates identical direction, 0 indicates orthogonality (no similarity), and -1 indicates opposite directions. The formula is:

cosine_similarity(A, B) = (A · B) / (||A|| × ||B||)

where A · B is the dot product of vectors A and B, and ||A|| represents the magnitude (length) of vector A.

Cosine similarity is particularly well-suited for embeddings because it focuses on the direction of vectors rather than their magnitude. Two vectors pointing in the same direction are considered similar even if one is much longer than the other. This property is valuable because embedding magnitudes can vary based on factors like text length or model confidence, but the direction captures the core semantic meaning.

In practice, cosine similarity is computationally efficient and works well with normalized embeddings (vectors scaled to unit length). Many embedding models produce normalized vectors by default, in which case cosine similarity is equivalent to the dot product, making computation even faster.

Euclidean Distance

Euclidean distance measures the straight-line distance between two points in vector space. For vectors A and B, it’s calculated as:

euclidean_distance(A, B) = √(Σ(A_i - B_i)²)

Unlike cosine similarity, Euclidean distance considers both direction and magnitude. Two vectors pointing in the same direction but with different lengths will have a non-zero Euclidean distance. This can be advantageous when magnitude carries meaningful information, but for most embedding applications, direction is more important than magnitude.

Euclidean distance produces values from 0 (identical vectors) to infinity, with smaller values indicating greater similarity. When working with normalized embeddings, Euclidean distance and cosine similarity are closely related and often produce similar rankings of results.

Dot Product Similarity

The dot product is the simplest similarity metric, calculated as:

dot_product(A, B) = Σ(A_i × B_i)

For normalized embeddings, the dot product equals cosine similarity. For unnormalized embeddings, it considers both direction and magnitude, rewarding vectors that are both aligned and large. Some applications intentionally use unnormalized embeddings with dot product similarity when magnitude should influence similarity scores.

Manhattan Distance

Manhattan distance (also called L1 distance or taxicab distance) sums the absolute differences between corresponding dimensions:

manhattan_distance(A, B) = Σ|A_i - B_i|

This metric is less commonly used for embeddings but can be more robust to outliers in individual dimensions compared to Euclidean distance. It’s computationally simpler than Euclidean distance since it avoids the square root operation.

Choosing the Right Metric

The choice of distance metric should align with how your embedding model was trained and what your application needs to measure. Most modern embedding models are trained with cosine similarity in mind, making it the default choice for most applications. Cosine similarity is particularly appropriate when you want to measure semantic similarity regardless of factors like text length or confidence scores that might affect vector magnitude.

For applications requiring fast similarity search across millions of vectors, the computational efficiency of different metrics matters. Dot product is fastest for normalized embeddings, followed by Manhattan distance, then Euclidean distance. Vector databases often optimize for specific metrics, so your choice may affect query performance.

Some advanced applications use learned distance metrics, where a neural network learns to compute similarity in a task-specific way. This can improve performance for specialized applications but requires additional training data and computation.

Choosing Embedding Models for Your Use Case

Selecting the right embedding model is a critical decision that affects the performance, cost, and capabilities of your AI application. Different models excel at different tasks, and the optimal choice depends on your specific requirements and constraints.

Model Families and Architectures

Embedding models come in several families, each with different strengths. Encoder-only models based on BERT-style architectures excel at understanding and comparing text. These models process input bidirectionally, considering both preceding and following context, which makes them particularly effective for semantic search and classification tasks.

Sentence transformer models are specifically fine-tuned to produce high-quality sentence and document embeddings. These models are trained on sentence pairs to learn representations that capture semantic similarity at the sentence level, making them ideal for applications that need to compare or retrieve entire passages rather than individual words.

Multilingual models are trained on text in many languages and can produce embeddings that capture semantic similarity across language boundaries. A query in English can retrieve relevant documents in Spanish or Japanese if both are embedded with a multilingual model. This capability is essential for global applications but typically comes with some performance trade-off compared to monolingual models.

Performance Considerations

Model size directly impacts both quality and computational requirements. Larger models with more parameters generally produce better embeddings but require more memory and processing time. A model with 110 million parameters might generate embeddings in milliseconds on a CPU, while a 7 billion parameter model might require GPU acceleration and take seconds per input.

The embedding dimension produced by the model affects downstream storage and search performance. Smaller dimensions (128-384) are faster to search and require less storage, while larger dimensions (768-1536) typically capture more semantic nuance. Some models offer multiple dimension options, allowing you to choose the appropriate trade-off for your use case.

Throughput requirements should guide your model selection. If you need to embed millions of documents in batch processing, a smaller, faster model might be more practical than a larger, more accurate one. For real-time applications where latency matters, model inference speed becomes critical.

Task-Specific Optimization

Different embedding models are optimized for different tasks. Models trained specifically for retrieval tasks excel at distinguishing relevant from irrelevant documents but might not perform as well for clustering or classification. Models fine-tuned on domain-specific data (medical texts, legal documents, code) often outperform general-purpose models in those domains.

For RAG applications, you want models that excel at asymmetric similarity—comparing short queries to longer documents. Some models are specifically trained for this task and perform better than models trained on symmetric similarity (comparing similar-length texts).

If your application involves specialized vocabulary or domain-specific language, consider models that support fine-tuning. You can adapt a general-purpose model to your domain by training it on your specific data, potentially achieving better performance than using a larger general-purpose model.

Practical Evaluation

Benchmark scores provide useful guidance but don’t tell the whole story. Models should be evaluated on data similar to your actual use case. A model that achieves state-of-the-art results on academic benchmarks might underperform on your specific data if the domains differ significantly.

Create a representative test set from your actual data and evaluate candidate models on metrics that matter for your application. For semantic search, measure retrieval accuracy using metrics like precision, recall, and mean reciprocal rank. For clustering, evaluate how well the embeddings group similar items together.

Consider the total cost of ownership, including not just the model’s computational requirements but also the infrastructure needed to serve it, storage costs for the embeddings it produces, and the engineering effort required to deploy and maintain it. Sometimes a slightly less accurate but much more efficient model provides better overall value.

Vector Embeddings in RAG Systems

Retrieval-augmented generation (RAG) systems use vector embeddings as their foundation for finding relevant information to augment language model responses. Understanding how embeddings function in RAG architectures is essential for building effective systems that provide accurate, grounded answers.

The RAG Pipeline

RAG systems operate in two main phases: indexing and retrieval. During indexing, documents are split into chunks, each chunk is converted to an embedding vector, and these vectors are stored in a vector database along with the original text. This creates a searchable index where semantic similarity can be computed efficiently.

When a user submits a query, the RAG system embeds the query using the same embedding model used for documents. It then searches the vector database for chunks with embeddings most similar to the query embedding, typically using cosine similarity or dot product. The retrieved chunks, along with the original query, are provided as context to a language model, which generates a response grounded in the retrieved information.

Chunking Strategies

How you split documents into chunks significantly affects RAG performance. Chunks must be large enough to contain meaningful context but small enough to be specific and relevant. Common chunk sizes range from 256 to 1024 tokens, with overlap between consecutive chunks to avoid splitting important information.

Fixed-size chunking is simple but can break up coherent ideas. Semantic chunking attempts to split documents at natural boundaries like paragraphs or topic shifts, producing more coherent chunks that embed better. Some advanced approaches use recursive splitting, starting with large chunks and subdividing only when necessary.

The chunk size should match your embedding model’s capabilities and your use case requirements. Longer chunks provide more context but may be less precise for specific queries. Shorter chunks are more targeted but might lack sufficient context for the language model to generate accurate responses.

Embedding Quality and Retrieval Accuracy

The quality of your embeddings directly determines retrieval accuracy, which in turn affects the quality of generated responses. Poor embeddings lead to irrelevant chunks being retrieved, causing the language model to generate responses based on incorrect or off-topic information.

Embedding models trained specifically for retrieval tasks typically outperform general-purpose models in RAG systems. These models are fine-tuned to maximize the similarity between queries and relevant documents while minimizing similarity to irrelevant ones. They handle the asymmetric nature of search—short queries matching longer documents—more effectively than models trained on symmetric similarity tasks.

Domain adaptation can significantly improve RAG performance. If your documents use specialized terminology or cover specific topics, fine-tuning your embedding model on representative examples from your domain helps it learn the semantic relationships specific to your content.

Hybrid Search Approaches

Many production RAG systems combine vector similarity search with traditional keyword-based search in a hybrid approach. Vector search excels at semantic matching and handling paraphrases, while keyword search is precise for exact terms and proper nouns. Combining both approaches often yields better results than either alone.

Hybrid search typically involves retrieving candidates using both methods and then merging and re-ranking the results. The re-ranking step might use a more sophisticated model to score relevance or apply business logic to prioritize certain types of results.

Metadata Filtering

Effective RAG systems often combine vector similarity with metadata filtering. Each chunk is stored with metadata like document source, date, author, or category. Queries can specify filters—“find information about X from documents published after 2023”—that narrow the search space before or after vector similarity ranking.

Metadata filtering improves both accuracy and efficiency. By reducing the search space, you can retrieve more relevant results and reduce computational costs. Metadata can also enable features like source attribution, where the system shows users which documents contributed to the response.

Evaluation and Optimization

RAG system performance should be evaluated end-to-end, not just on retrieval accuracy. Metrics should include whether the language model generates correct answers, whether retrieved chunks contain the necessary information, and whether the system handles edge cases appropriately.

Common optimization strategies include adjusting chunk size and overlap, tuning the number of retrieved chunks, experimenting with different embedding models, and implementing re-ranking stages. A/B testing different configurations with real user queries helps identify the most effective approach for your specific use case.

Best Practices for Working with Embeddings

Successfully deploying embedding-based systems requires attention to numerous practical considerations beyond just choosing a model and computing vectors. These best practices help ensure reliable, efficient, and maintainable systems.

Consistency in Embedding Generation

Always use the same embedding model and configuration for both indexing and querying. Using different models or versions produces embeddings in incompatible vector spaces, making similarity comparisons meaningless. If you need to update your embedding model, you must re-embed your entire document corpus.

Version control your embedding models and configurations. Track which model version generated which embeddings, and maintain the ability to reproduce embeddings if needed. This becomes critical when debugging issues or migrating to new models.

Normalize embeddings consistently. If your similarity metric assumes normalized vectors, ensure all embeddings are normalized immediately after generation. Inconsistent normalization can lead to subtle bugs where some comparisons work correctly while others produce unexpected results.

Handling Text Preprocessing

The text preprocessing applied before embedding significantly affects results. Decide on consistent rules for handling capitalization, punctuation, special characters, and whitespace. Some embedding models are case-sensitive, while others are not. Some handle punctuation meaningfully, while others ignore it.

Be cautious with aggressive preprocessing like stemming or lemmatization. Modern embedding models are trained on natural text and may perform worse on heavily preprocessed input. The model’s training data should guide your preprocessing decisions—if the model was trained on natural text with punctuation and capitalization, preserve those features.

For multilingual applications, ensure consistent language handling. Some models require language identification before embedding, while others handle multiple languages automatically. Mixing languages within a single text might require special handling depending on your model.

Efficient Storage and Indexing

Vector databases are essential for production systems that need to search millions of embeddings efficiently. These specialized databases use indexing structures like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) to enable fast approximate nearest neighbor search.

Choose storage precision carefully. Embeddings are typically generated as 32-bit floats, but many applications can use 16-bit or even 8-bit quantization with minimal accuracy loss. Reducing precision can halve or quarter storage requirements and improve search speed.

Implement appropriate indexing strategies based on your query patterns. If you frequently filter by metadata before vector search, ensure your database indexes both metadata and vectors efficiently. Some databases support hybrid indexes that optimize for combined filtering and similarity search.

Monitoring and Quality Assurance

Monitor embedding quality in production. Track metrics like average similarity scores, distribution of retrieval results, and user engagement with retrieved content. Sudden changes in these metrics might indicate issues with embedding generation or data quality.

Implement validation checks for generated embeddings. Verify that embeddings have the expected dimensionality, contain no NaN or infinite values, and fall within reasonable ranges. Corrupted embeddings can cause subtle failures that are difficult to debug.

Maintain test sets of known similar and dissimilar text pairs. Regularly verify that your embedding system produces expected similarity scores for these test cases. This helps catch regressions when updating models or infrastructure.

Scaling Considerations

Batch embedding generation when possible. Processing multiple texts together is typically much more efficient than processing them individually. Most embedding models can process batches of dozens or hundreds of texts simultaneously with minimal latency increase.

Implement caching for frequently embedded texts. If certain queries or documents are embedded repeatedly, cache their embeddings to avoid redundant computation. This is particularly valuable for common queries in search applications.

Consider the trade-offs between embedding at index time versus query time. Pre-computing and storing embeddings for all documents requires more storage but enables faster queries. Computing embeddings on-demand reduces storage but increases query latency.

Security and Privacy

Be aware that embeddings can leak information about the original text. While embeddings are not directly human-readable, they can potentially be reverse-engineered or used to infer sensitive information. If working with confidential data, consider the security implications of storing and transmitting embeddings.

Implement appropriate access controls for embedding storage. Embeddings should be protected with the same security measures as the original documents they represent. In multi-tenant systems, ensure embeddings from different tenants are properly isolated.

Documentation and Reproducibility

Document your embedding pipeline thoroughly. Record the model used, preprocessing steps, chunking strategy, and any fine-tuning or adaptation applied. This documentation is essential for troubleshooting, reproducing results, and onboarding new team members.

Maintain example inputs and outputs. Save representative examples of texts and their embeddings, along with similarity scores for known pairs. These examples serve as regression tests and help validate that the system continues to work as expected after changes.

Conclusion

Vector embeddings have transformed how AI systems understand and process language, enabling semantic search, retrieval-augmented generation, and countless other applications. By representing text as points in high-dimensional space, embeddings capture meaning in a way that allows mathematical comparison and manipulation of semantic concepts.

The key to working effectively with embeddings is understanding the fundamentals: how embedding models transform text into vectors, what the dimensions represent, how to measure similarity, and how to choose appropriate models for your use case. Success in production systems requires attention to practical details like consistent preprocessing, efficient storage, and careful monitoring.

As embedding models continue to improve and new architectures emerge, the applications built on this technology will become even more powerful. Whether you’re building a simple semantic search system or a complex RAG application, a solid understanding of vector embeddings provides the foundation for creating effective, reliable AI systems that truly understand the meaning of text.

Build Production AI Agents with TARS

Ready to deploy AI agents at scale?

  • Advanced AI Routing - Intelligent request distribution
  • Enterprise Infrastructure - Production-grade reliability
  • $5 Free Credit - Start building immediately
  • No Credit Card Required - Try all features risk-free
Start Building →

Powering modern AI applications

For readers looking to deepen their understanding of vector embeddings and related technologies, several topics warrant further exploration:

Transformer Architectures: Understanding the neural network architectures that power modern embedding models, including attention mechanisms and how they enable contextual understanding.

Vector Databases: Specialized databases designed for storing and searching high-dimensional vectors efficiently, including indexing algorithms like HNSW and IVF that enable fast similarity search at scale.

Fine-tuning Embedding Models: Techniques for adapting pre-trained embedding models to specific domains or tasks, including contrastive learning approaches and domain adaptation strategies.

Semantic Search Systems: Building end-to-end search applications using embeddings, including query processing, result ranking, and user interface considerations.

Retrieval-Augmented Generation (RAG): Deep dive into RAG architectures, including advanced techniques like hybrid search, re-ranking, and handling multi-hop reasoning.

Dimensionality Reduction: Techniques like PCA, t-SNE, and UMAP for visualizing and compressing high-dimensional embeddings while preserving semantic relationships.

Multimodal Embeddings: Models that embed multiple types of content (text, images, audio) into a shared vector space, enabling cross-modal search and understanding.

Evaluation Metrics for Embeddings: Methods for assessing embedding quality, including intrinsic metrics like word similarity correlation and extrinsic metrics based on downstream task performance.

Cosine Similarity

Cosine similarity is the most widely used metric for comparing embeddings. It measures the cosine of the angle between two vectors, producing a value between -1 and 1, where 1 indicates identical direction, 0 indicates orthogonality (no similarity), and -1 indicates opposite directions.

The calculation involves taking the dot product of the two vectors and dividing by the product of their magnitudes. In other words, you multiply corresponding elements of the vectors and sum them, then divide by the lengths of both vectors multiplied together.

Cosine similarity is particularly well-suited for embeddings because it focuses on the direction of vectors rather than their magnitude. Two vectors pointing in the same direction are considered similar even if one is much longer than the other. This property is valuable because embedding magnitudes can vary based on factors like text length or model confidence, but the direction captures the core semantic meaning.

In practice, cosine similarity is computationally efficient and works well with normalized embeddings (vectors scaled to unit length). Many embedding models produce normalized vectors by default, in which case cosine similarity is equivalent to the dot product, making computation even faster.

Euclidean Distance

Euclidean distance measures the straight-line distance between two points in vector space. It calculates the square root of the sum of squared differences between corresponding elements of the two vectors.

Unlike cosine similarity, Euclidean distance considers both direction and magnitude. Two vectors pointing in the same direction but with different lengths will have a non-zero Euclidean distance. This can be advantageous when magnitude carries meaningful information, but for most embedding applications, direction is more important than magnitude.

Euclidean distance produces values from 0 (identical vectors) to infinity, with smaller values indicating greater similarity. When working with normalized embeddings, Euclidean distance and cosine similarity are closely related and often produce similar rankings of results.

Dot Product Similarity

The dot product is the simplest similarity metric. It multiplies corresponding elements of two vectors and sums the results.

For normalized embeddings, the dot product equals cosine similarity. For unnormalized embeddings, it considers both direction and magnitude, rewarding vectors that are both aligned and large. Some applications intentionally use unnormalized embeddings with dot product similarity when magnitude should influence similarity scores.

Manhattan Distance

Manhattan distance (also called L1 distance or taxicab distance) sums the absolute differences between corresponding dimensions. Rather than squaring the differences as Euclidean distance does, it simply adds up the absolute values of the differences.

This metric is less commonly used for embeddings but can be more robust to outliers in individual dimensions compared to Euclidean distance. It’s computationally simpler than Euclidean distance since it avoids the square root operation.

Decorative CTA background pattern background background
Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

Ready to enhance your
network

with more
intelligence?