Venkat Raman
1 min readAug 12, 2019

Thanks for your response.

The problem mentioned in the article could be solved by other methods as well. It does not show that only Redis can solve this problem. I just used it because,I was familiar with Redis and Spacy.

The reason it would take days without Redis or any such in memory data store is because

  • Some laptops may not have enough computing power. Also, Python is inherently slow (even multiprocessing does not help)
  • The word similarity exercise is complex given the scale. Each word form 200k list gets compared to each word in 10k list. Then a cosine similarity score is assigned to each mapping and finally the best 2–3 words with the highest cosine score is extracted and written to a csv file or Database.

The data I used was confidential hence can’t be shared. But you could definitely replicate my findings using your own data set. The code is real working code and not a pseudo code. I would love to learn your approach though using simpler data structures.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Venkat Raman
Venkat Raman

Written by Venkat Raman

Co-Founder of Aryma Labs. Data scientist/Statistician with business acumen. Hoping to amass knowledge and share it throughout my life. Rafa Nadal Fan.

Responses (1)

Write a response

I agree that Python is slow if you use Python directly for complex computations, but that’s why we have Cython and numpy-like libraries. You _shouldn’t_ use Python directly (at least CPython). Spacy heavily uses Cython and Cython-powered libraries…

--