Foundational knowledge is like the foundation of a house.
In spaCy, words, sentences, and documents are represented as vectors in a high-dimensional space. These vectors allow the model to capture semantic meaning and relationships between words.
"apple" are represented as vectors in ℝⁿ:v_apple = [0.2, -0.1, 0.4, ..., 0.5]
cos(θ) = (v₁ ⋅ v₂) / (||v₁|| × ||v₂||)
import spacy
nlp = spacy.load("en_core_web_md")
doc1 = nlp("apple")
doc2 = nlp("orange")
print(doc1.similarity(doc2))
h = W ⋅ x + b
spaCy uses statistical modeling for tasks like text classification and named entity recognition. Here's how the math fits in:
P(yᵢ) = exp(zᵢ) / Σⱼ exp(zⱼ)
P(Class | Features) = (P(Features | Class) × P(Class)) / P(Features)
L = -Σ yᵢ log(ŷᵢ)
| Area | Math Concept | How It's Used in spaCy |
|---|---|---|
| Word Similarity | Vectors & Dot Product | Used in cosine similarity |
| Classification | Softmax, Cross-Entropy | Entity and text classification |
| Representation | Matrices | Model weight structures |
| Learning | Gradient Descent | Trains model to minimize loss |
| Language Modeling | Probability Theory | Predicts next token or intent |