Blog Post Archive

🔍 How Mathematics Powers Machine Learning in spaCy

🔢 1. Linear Algebra in spaCy

In spaCy, words, sentences, and documents are represented as vectors in a high-dimensional space. These vectors allow the model to capture semantic meaning and relationships between words.

Vector Representation: Words like "apple" are represented as vectors in ℝⁿ:
v_apple = [0.2, -0.1, 0.4, ..., 0.5]

Cosine Similarity: Used to compute similarity between word vectors:
cos(θ) = (v₁ ⋅ v₂) / (||v₁|| × ||v₂||)
Example:


        import spacy
        nlp = spacy.load("en_core_web_md")
        doc1 = nlp("apple")
        doc2 = nlp("orange")
        print(doc1.similarity(doc2))

Matrix Multiplication: Core operation in neural networks:
h = W ⋅ x + b

🎲 2. Probability & Statistics in spaCy

spaCy uses statistical modeling for tasks like text classification and named entity recognition. Here's how the math fits in:

Softmax Function: Converts raw output to probability distribution:
P(yᵢ) = exp(zᵢ) / Σⱼ exp(zⱼ)
Bayes’ Theorem: Foundation for many classification tasks:
P(Class | Features) = (P(Features | Class) × P(Class)) / P(Features)
Cross-Entropy Loss: Measures the difference between predicted and actual output:
L = -Σ yᵢ log(ŷᵢ)

🧠 Summary Table

Area	Math Concept	How It's Used in spaCy
Word Similarity	Vectors & Dot Product	Used in cosine similarity
Classification	Softmax, Cross-Entropy	Entity and text classification
Representation	Matrices	Model weight structures
Learning	Gradient Descent	Trains model to minimize loss
Language Modeling	Probability Theory	Predicts next token or intent