LIBSVM-Reuters experiments

From Humanitarian FOSS Summer Institute

Jump to: navigation, search

Experiments

  • Using articles from the Reuters database, we obtained vectors that represented the weight of each word for each article. To do this we:
    • Parsed the articles into a readable format.
    • Created a dictionary with all the words in the list of articles, along with the frequency of each word for each document.
    • Obtained the inverse frequency of each word : Log(Number of Documents / Number of Documents Containing this Word)
    • Obtained the weight of each word for each document by multiplying the frequency with the inverse frequency
    • For each document, created vectors that can be read by libSVM.
    • Separated the vectors into a training group and a testing group, and fed them to libSVM
    • Calculated sensitivity and specificity based on the results returned by libSVM


  • To obtain accurate results we took the following precautions:
    • Only take documents labeled with "TOPICS=YES" in the Reuters database
    • Only take documents with at least one word in the body
    • Only take words that are more than three characters long.


Results

  • Test 1
Sensitivity and Specificity results for Test 1
Training Vectors Sensitivity Specificity
2 0.45 0.83
5 0.99 0.03
10 0.48 0.95
20 0.41 1
30 0.46 1
50 0.53 1
75 0.53 1
100 0.54 1
200 0.59 1
300 0.72 0.99
400 0.76 0.99
500 0.81 0.98



  • Test 2
Sensitivity and Specificity results for Test 2
Training Vectors Sensitivity Specificity
2 0.15 0.96
5 0.27 1
10 0.14 1
20 0.20 1
30 0.31 1
50 0.28 1
75 0.37 1
100 0.41 1
200 0.48 1
300 0.49 1
400 0.57 1
500 0.57 1




  • Test 3
Sensitivity and Specificity results for Test 3
Training Vectors Sensitivity Specificity
2 0.72 0.59
5 0.24 0.98
10 0.55 0.98
20 0.83 0.91
30 0.67 0.97
50 0.59 0.98
75 0.75 0.98
100 0.74 0.98
200 0.88 0.98
300 0.91 0.98
400 0.93 0.99
500 0.93 1



  • Test 4
Sensitivity and Specificity results for Test 4
Training Vectors Sensitivity Specificity
2 0.32 0.86
5 0.11 1
10 0.1 1
20 0.09 1
30 0.09 1
50 0.4 1
75 0.35 0.99
100 0.62 0.99
200 0.81 0.99
300 0.89 0.99
400 0.89 0.99
500 0.92 0.98





  • Final Average Result
Sensitivity and Specificity average results
Training Vectors Sensitivity Specificity
2 0.41 0.81
5 0.40 0.75
10 0.32 0.98
20 0.38 0.98
30 0.38 0.99
50 0.45 1
75 0.50 0.99
100 0.58 0.99
200 0.69 0.99
300 0.75 0.99
400 0.79 0.99
500 0.81 0.99
Personal tools