Foundational knowledge is like the foundation of a house.
In spaCy, words, sentences, and documents are represented as vectors in a high-dimensional space. These vectors allow the model to capture semantic meaning and relationships between words.
"apple"
are represented as vectors in ℝⁿ
:v_apple = [0.2, -0.1, 0.4, ..., 0.5]
cos(θ) = (v₁ ⋅ v₂) / (||v₁|| × ||v₂||)
import spacy
nlp = spacy.load("en_core_web_md")
doc1 = nlp("apple")
doc2 = nlp("orange")
print(doc1.similarity(doc2))
h = W ⋅ x + b
spaCy uses statistical modeling for tasks like text classification and named entity recognition. Here's how the math fits in:
P(yᵢ) = exp(zᵢ) / Σⱼ exp(zⱼ)
P(Class | Features) = (P(Features | Class) × P(Class)) / P(Features)
L = -Σ yᵢ log(ŷᵢ)
Area | Math Concept | How It's Used in spaCy |
---|---|---|
Word Similarity | Vectors & Dot Product | Used in cosine similarity |
Classification | Softmax, Cross-Entropy | Entity and text classification |
Representation | Matrices | Model weight structures |
Learning | Gradient Descent | Trains model to minimize loss |
Language Modeling | Probability Theory | Predicts next token or intent |