Thanks for your response.
The problem mentioned in the article could be solved by other methods as well. It does not show that only Redis can solve this problem. I just used it because,I was familiar with Redis and Spacy.
The reason it would take days without Redis or any such in memory data store is because
- Some laptops may not have enough computing power. Also, Python is inherently slow (even multiprocessing does not help)
- The word similarity exercise is complex given the scale. Each word form 200k list gets compared to each word in 10k list. Then a cosine similarity score is assigned to each mapping and finally the best 2–3 words with the highest cosine score is extracted and written to a csv file or Database.
The data I used was confidential hence can’t be shared. But you could definitely replicate my findings using your own data set. The code is real working code and not a pseudo code. I would love to learn your approach though using simpler data structures.