It is a simple one. For each of the words, generate a descriptor or a hash table with 26 keys as alphabets a, b, c, d, ... z and for corresponding values just have the frequency of occurrence of those characters in the word [0 for no-occurrence]. Now, for n words you will generate n descriptors or hash tables. Now you can compare the first descriptor with rest of the descriptors using hamming distance concept and mark the ones that are similar and counted. In the worst case, this will give you a complexity of O(n^2) [n-2 + n-1 + n-3 + ... + 1 -> Comparisons of descriptors]. There could be other ways to optimize this O(n^2) complexity time as well.