LIBSVM-Promed experiments
From Humanitarian FOSS Summer Institute
Experiments
- Using articles from the Pro-Med database, we obtained vectors that represented the weight of each word for each article. To do this we:
- Parsed the articles into a readable format.
- Created a dictionary with all the words in the list of articles, along with the frequency of each word for each document.
- Obtained the inverse frequency of each word : Log(Number of Documents / Number of Documents Containing this Word)
- Obtained the weight of each word for each document by multiplying the frequency with the inverse frequency
- For each document, created vectors that can be read by libSVM.
- Separated the vectors into a training group and a testing group, and fed them to libSVM
- Calculated sensitivity and specificity based on the results returned by libSVM
- To obtain accurate results we took the following precautions:
- Only take words that are more than three characters long.
Results
- Test 1
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 1 | 0.7 |
| 5 | 0 | 1 |
| 10 | 0 | 1 |
| 15 | 0.62 | 0.98 |
| 20 | 0.92 | 0.97 |
| 30 | 0.67 | 1 |
| 40 | 1 | 1 |
| 50 | 1 | 1 |
| 60 | 1 | 1 |
- Test 2
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 1 | 0.69 |
| 5 | 0.71 | 1 |
| 10 | 1 | 0.98 |
| 15 | 1 | 0.98 |
| 20 | 0.88 | 0.98 |
| 30 | 0.86 | 0.97 |
| 40 | 0.75 | 0.96 |
| 50 | 0.75 | 1 |
| 60 | 1 | 1 |
- Test 3
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 1 | 0.55 |
| 5 | 0 | 1 |
| 10 | 0 | 1 |
| 15 | 0.85 | 0.98 |
| 20 | 0.9 | 0.97 |
| 30 | 0.83 | 0.97 |
| 40 | 1 | 0.96 |
| 50 | 1 | 1 |
| 60 | 1 | 1 |
- Final Average Result
| Training Vectors | Sensitivity | Specificity |
|---|---|---|
| 2 | 1 | 0.64 |
| 5 | 0.24 | 1 |
| 10 | 0.33 | 0.99 |
| 15 | 0.82 | 0.98 |
| 20 | 0.9 | 0.97 |
| 30 | 0.79 | 0.98 |
| 40 | 0.92 | 0.97 |
| 50 | 0.92 | 1 |
| 60 | 1 | 1 |


