This dir contains the following scripts: sentSimilarity.pl process_td.pl rank_new.pl new_idf.pl and the following data files: myDictionary.txt.sorted stoplist.txt. Installation Steps: - You need to install perl, and some libraries from cpan to run the script. - CPAN libraries: Text::Similarity::Overlaps Text::OverlapFinder Getopt::Long Steps to Produce Semantically Related Words: You can follow the following steps to produce semantically related words and rank the word list: 1. Given an input file which contains a list of sentences (e.g., commons.sentence), first use new_idf.pl to produce idf score for every word in commons.sentence. Usage: ./new_idf.pl You will get tfidf file (e.g., commons.tf) which contains mappings between word and its idf value. 2. Use sentSimilarity.pl to produce semantically related words from input file. Usage: ./sentSimilarity.pl -i -thres -output -idf You can set different configurations (e.g., thresholds, gap, whether to use idf or not, etc). More detailed information can be found in sentSimilarity.pl setting part and our MSR'12 paper. 3. Use process_td.pl to process the raw output from sentSimilarity.pl. This script will remove duplicates, perform stemming. Usage: ./process_td.pl 4. Use rank_new.pl to rank the output from process_td.pl (note that the output file not the cluster file) Usage: ./rank_new.pl The output file prefix_new_rank.csv will be the ultimate ranking file, and the ranking is based on average idf value and support.