LIBSVM-Reuters experiments
From Humanitarian FOSS Summer Institute
Experiments
- Using articles from the Reuters database, we obtained vectors that represented the weight of each word for each article. To do this we:
- Parsed the articles into a readable format.
- Created a dictionary with all the words in the list of articles, along with the frequency of each word for each document.
- Obtained the inverse frequency of each word : Log(Number of Documents / Number of Documents Containing this Word)
- Obtained the weight of each word for each document by multiplying the frequency with the inverse frequency
- For each document, created vectors that can be read by libSVM.
- Separated the vectors into a training group and a testing group, and fed them to libSVM
- Calculated sensitivity and specificity based on the results returned by libSVM
- To obtain accurate results we took the following precautions:
- Only take documents labeled with "TOPICS=YES" in the Reuters database
- Only take documents with at least one word in the body
- Only take words that are more than three characters long.
Results
- Test 1
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 0.45 | 0.83 |
| 5 | 0.99 | 0.03 |
| 10 | 0.48 | 0.95 |
| 20 | 0.41 | 1 |
| 30 | 0.46 | 1 |
| 50 | 0.53 | 1 |
| 75 | 0.53 | 1 |
| 100 | 0.54 | 1 |
| 200 | 0.59 | 1 |
| 300 | 0.72 | 0.99 |
| 400 | 0.76 | 0.99 |
| 500 | 0.81 | 0.98 |
- Test 2
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 0.15 | 0.96 |
| 5 | 0.27 | 1 |
| 10 | 0.14 | 1 |
| 20 | 0.20 | 1 |
| 30 | 0.31 | 1 |
| 50 | 0.28 | 1 |
| 75 | 0.37 | 1 |
| 100 | 0.41 | 1 |
| 200 | 0.48 | 1 |
| 300 | 0.49 | 1 |
| 400 | 0.57 | 1 |
| 500 | 0.57 | 1 |
- Test 3
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 0.72 | 0.59 |
| 5 | 0.24 | 0.98 |
| 10 | 0.55 | 0.98 |
| 20 | 0.83 | 0.91 |
| 30 | 0.67 | 0.97 |
| 50 | 0.59 | 0.98 |
| 75 | 0.75 | 0.98 |
| 100 | 0.74 | 0.98 |
| 200 | 0.88 | 0.98 |
| 300 | 0.91 | 0.98 |
| 400 | 0.93 | 0.99 |
| 500 | 0.93 | 1 |
- Test 4
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 0.32 | 0.86 |
| 5 | 0.11 | 1 |
| 10 | 0.1 | 1 |
| 20 | 0.09 | 1 |
| 30 | 0.09 | 1 |
| 50 | 0.4 | 1 |
| 75 | 0.35 | 0.99 |
| 100 | 0.62 | 0.99 |
| 200 | 0.81 | 0.99 |
| 300 | 0.89 | 0.99 |
| 400 | 0.89 | 0.99 |
| 500 | 0.92 | 0.98 |
- Final Average Result
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 0.41 | 0.81 |
| 5 | 0.40 | 0.75 |
| 10 | 0.32 | 0.98 |
| 20 | 0.38 | 0.98 |
| 30 | 0.38 | 0.99 |
| 50 | 0.45 | 1 |
| 75 | 0.50 | 0.99 |
| 100 | 0.58 | 0.99 |
| 200 | 0.69 | 0.99 |
| 300 | 0.75 | 0.99 |
| 400 | 0.79 | 0.99 |
| 500 | 0.81 | 0.99 |

