Problem: Top K Most Similar Documents
You are given:
an integer array queryEmb of length D, representing a query embedding
a 2D integer array docEmbs of size N x D, representing N document embeddings
an integer k
All embeddings are already L2-normalized.
The cosine similarity between two normalized vectors is equal to their dot product.
Return the indices of the k documents with the highest cosine similarity to queryEmb, ordered from most similar to least similar.
If k > N, return all document indices sorted by similarity.
Function Signature
def topKSimilar(queryEmb: np.ndarray, docEmbs: np.ndarray, k: int) -> np.ndarray