The presented data set contains the following variables. First, the type of match, a factorial variable that presents whether the match was made based on a “Perfect match” (i.e. the searched term and the corresponding returned value obtained from our script matched perfectly), a “Synonym match” (i.e. matched based on the list of synonyms provided by DrugBank) or through 'Fuzzy Search' (using the methodology described above). Secondly, the searched term is listed in the second column and the returned term in the third column. Column four contains known synonyms, while Freebase IDs or GKGI are presented in the fifth column.