How to Implement a Cosine Similarity Function in TypeScript

by alexopon 3/9/25, 9:20 AMwith 40 comments
by ashvardanianon 3/10/25, 9:54 PM

It’s a nice post, but “using array methods” probably shouldn’t be placed in the “Efficient Implementation” section. As often happens with high-level languages, a single plain old loop is faster than three array methods.

Similarly, if you plan to query those vectors in search, you should consider continuous `TypedArray` types and smaller scalars than the double precision `number`.

I know very little about JS, but some of the amazing HackerNews community members have previously helped port SimSIMD to JavaScript (https://github.com/ashvardanian/SimSIMD), and I wrote a blog post covering some of those JS/TS-specifics, NumJS, and MathJS in 2023 (https://ashvardanian.com/posts/javascript-ai-vector-search/).

Hopefully, it should help unlock another 10-100x in performance.

by frotauron 3/11/25, 2:02 AM

As a physicist, I always found it funny that in ML people renamed 'the angle between vectors' into something as fancy-feeling as 'cosine similarity'.

by dvton 3/11/25, 1:19 AM

Great post, but what struck me (again, like every time I look at cos similiarity) is how unreasonably well it works. It's just one of those things that's so "weird" about our world: why would cosine similarity work in n-dimensional semantic spaces? It's so stupid simple, and it intuitively makes sense, and it works really well. Crazy cool.

I'm reminded of that old Eugene Wigner quote: "The most incomprehensible thing about the universe is that it is comprehensible."

by schappimon 3/10/25, 9:01 PM

I attempted to implement this on the front end of my e-commerce site, which has approximately 20,000 products (see gist [1]). My goal was to enhance search speed by performing as many local operations as possible.

Biggest impact in performance was by moving to dot products.

Regrettably, the sheer size of the index of embeddings rendered it impractical to achieve the desired results.

1. https://gist.github.com/schappim/d4a6f0b29c1ef4279543f6b6740...

by itishappyon 3/10/25, 9:18 PM

Those are some janky diagrams. The labels are selectable, and therefore are repeatedly highlighted and un-highlighted while dragging the vector around. The "direction only" arrow prevents you from changing the magnitude, but it doesn't prevent said magnitude from changing and it does so often because the inputs are quantized but the magnitude isn't. Multiple conventions for decimals are used within the same diagram. The second diagram doesn't center the angle indicator on the actual angle. Also the "send me feedback on X" popup doesn't respond to the close button, but then disappeared when I scrolled again so maybe it did? I'm running Chrome 134.0.6998.36 for Windows 10 Enterprise 22H2 build 19045.5487.

This whole thing looks like unreviewed AI. Stylish but fundamentally broken. I haven't had a chance to dig into the meat of the article yet, but unfortunately this is distracting enough that I'm not sure I will.

Edit: I'm digging into the meat, and it's better! Fortunately, it appears accurate. Unfortunately, it's rather repetitive. There's two paragraphs discussing the meaning of -1, 0, and +1 interleaved with multiple paragraphs explaining how cosine similarity allows vectors to be compared regardless of magnitude. The motivation is spread throughout the whole thing and repetitive, and the real world examples seem similar though formatted just differently enough to make it hard to tell at a glance.

To try to offer suggestions instead of just complaining... Here's my recommended edits:

I'd move the simple English explanation to the top after the intro, then remove everything but the diagrams until you reach the example. I'd completely remove the explanation of vectors unless you're going to include an explanation of dot products. I really like the functional approach, but feel like you could combine it with the `Math.hypot` example (leave the full formula as a comment, the rest is identical), and with the full example (although it's missing the `Math.hypot` optimization). Finally, I feel like you could get away with just one real web programming example, though don't know which one I'd choose. The last section about using OpenAI for embedding and it's disclaimer is already great!

by ZoomZoomZoomon 3/10/25, 11:47 PM

I've just written a tag similarity calculation for a custom set of tagged documents, and if you're going to actually use the metrics, it's probably a good idea to fill a similarity table instead of recalculating it all on the spot.

While doing that, the next logical step is precalculating the squared magnitudes for each item, and in my small test case of <1000 items that sped the calculation up almost 300 times. The gains are not exponential, but economy on that constant for each pair considered is not insignificant, especially on small data sets (of course, with large sets a table won't cut it due to memory limitations and requires more sophisticated techniques).

by fergieon 3/11/25, 7:15 AM

Very well explained (although not sure if the TypeScript is necessary)

I wonder though- isn't classic TF-IDF scoring also a type of basic vector comparison? What is the actual difference between "vector database" and something like elastic search?

by yesthisisweson 3/11/25, 12:16 AM

I liked your article. The chart with the vectors on it was cool though kinda hard to use on mobile.

I went to the typescript tag and tried to read a few other articles and got 404 errors. Just wanted to let ya know.

Nice blog and good work!

by raverbashingon 3/11/25, 6:29 AM

Meanwhile in Python this is just

    (a @ b.T)/(np.linalg.norm(a)*np.linalg.norm(b))

by sunami-aion 3/11/25, 3:14 AM

I do all my work in Rust now via o3-mini-high and convert to WASM... JS just for DOM and event handling. What is the point of building these CPU-bound functions in TS. Why not Rust->WASM?