EVALASSIST- AI Powered Assisted Evaluations
EvalAssist aids evaluators to grade subjective (long answer) questions at scale with the help AI and NLP algorithms. It aims at increasing efficiency, reducing the workload and increasing scoring consistency of evaluators.
In order to run EvalAssist an ideal answer is added as reference text while creating a question in the Examine platform. The response of candidates is evaluated in correspondence with the ideal answer on the following parameters:
- COSINE SIMILARITY: It is measure of how similar two pieces of text are by calculating the cosine angle between the respective word embedding of response and ideal answer.
- TOPIC SIMILARITY: This is the similarity between important topics/keywords present in both the texts.
- SUMMARY COSINE SIMILARITY: Abstractive summaries of both the texts are extracted and the cosine angle between the word embeddings of both the summaries is calculated.
- LENGTH CRITERIA SIMILARITY: This measure has been kept ensuring that the candidate's response is of satisfactory length in comparison to the ideal answer.
Scores for each of the above parameters are calculated separately and stored before generating the final scoring.
Note – All the parameters above have a score in the range of 0-1
Scoring and confidence Logic
Cosine similarity, topic similarity, summary cosine similarity and length criteria similarity is considered for generating the final scores and AI confidence. Please refer to below example:
Score Calculation
Overall Marks:
Let us suppose maximum marks for a question is – 10 marks
Scores for the parameters mentioned are as follows:
- Cosine Similarity – 0.8 (0-1)
- Topic Similarity – 0.75 (0-1)
- Summary Cosine Similarity – 0.68 (0-1)
- Length Criteria Similarity – 0.6 ({response word count}/ {ideal answer word count})
Weightage Assigned to each parameter:
- Weightage of cosine similarity – 30%
- Weightage of topic similarity – 25%
- Weightage of summary cosine similarity – 25%
- Weightage of length similarity – 20%
Max marks for each parameter:
- Max marks for cosine similarity according to weightage - (30/100) *10 = 3
- Max marks for topic similarity according to weightage - (25/100) *10 = 2.5
- Max marks for summary cosine similarity according to weightage - (25/100) *10 = 2.5
- Max marks for length similarity according to weightage - (20/100) *10 = 2
Score of each parameter:
- Cosine similarity out of its max marks = 0.8*3 = 2.4
- Topic similarity out of its max marks = 0.75*2.5 = 1.875
- Summary Cosine similarity out of its max marks = 0.68*2.5 = 1.7
- Length similarity out of its max marks = 0.7* 2 = 1.4
Note – The length similarity is taken as 0.7 as 0.1 extra advantage is given to candidate as not always the candidate response maybe as lengthy as the ideal answer but maybe its quality could be good.
Overall marks = 2.4+1.875+1.7+1.4 = 7.375 marks out of 10 ~ (7.37/10)
AI Confidence calculation
Confidence Thresholds -
- Low Confidence - (0-0.3)
- Medium Confidence - (0.3-0.7)
- High Confidence - (0.7-1)
According to above scores -
- Cosine AI Confidence – High (0.8)
- Topic AI Confidence – High (0.75)
- Summary AI Confidence – Medium (0.68)
- Length AI Confidence – Medium (0.6)
Final Confidence is calculated using the below table:
So, in our example we have 2 parameters which have high confidence and 2 parameters which have low confidence, refer to the table below for confidence
Final confidence from the above table for our example is – Medium
Note: Final Confidence, 2 stands for high, 1 stands for medium and 0 stands for low confidence
So, the final scoring is as follows -
Overall Marks = 7.37/10
AI Confidence = Medium