EVALASSIST- AI Powered Assisted Evaluations

EVALASSIST- AI Powered Assisted Evaluations

EvalAssist aids evaluators to grade subjective (long answer) questions at scale with the help AI and NLP algorithms. It aims at increasing efficiency, reducing the workload and increasing scoring consistency of evaluators. 

How It works.

There are two ways EvalAssist grades answers: 

1. With a Reference Answer: 

When creating questions, an ideal answer is added to the system. EvalAssist compares students’ answers to this reference to decide how well they did. This is useful for technical or subject-specific questions. 

2. Without a Reference Answer: 

For questions like essays where there isn’t a specific model answer, the system grades based on general language and relevance rules. This is good for open-ended or non-technical questions.   

Parameters  

When using Reference Answer:
To run EvalAssist an ideal answer is added as a reference text while question creation on the Examiner platform. The response of candidates is evaluated in correspondence with the ideal answer on the following parameters:  
  1. COSINE SCORE: It is measure of how similar two pieces of text are by calculating the cosine angle between the respective word embedding of response and ideal answer. 
  2. TOPIC SCORE: This is the similarity between important topics/keywords present in both the texts. 
  3. TEXT SUMMARY SCORE: Abstractive summaries of both the texts are extracted, and the cosine angle between the word embeddings of both the summaries is calculated.   
  4. LENGTH OF RESPONSE: This measure has been kept ensuring that the candidate's response is of satisfactory length in comparison to the ideal answer.
  5. TEXT SEMANTIC SCORE: This parameter measures the degree to which both texts are similar in meaning and context. 
  6. GRAMMAR AND SPELL CHECK: This is used to find out the total number of grammatical and spelling errors in the candidate response. 
  7. RELEVANCY OCCURENCE: This parameter measures the frequency of commonly occurring topics which match both the candidate answer and ideal answer. 
  8. COHERENCE: This metric essentially captures the logical flow within the text, meaning how well the sentences and ideas relate to adjacent or nearby sentences. 
When grading without Reference Answer: 
In without reference text cases or essay grading, the evaluation of candidate's response is done on certain language parameters and checking the relevancy in correspondence to question text. Below are the parameters: 
  1. TOPIC SCORE: This is the similarity between important topics/keywords present in the candidate response and topics generated using AI for question text. 
  2. LENGTH OF RESPONSE: This is calculated solely based on the length of candidate's response. 
  3. GRAMMAR AND SPELL CHECK: This is used to find out the total number of grammatical and spelling errors in the candidate response. 
  4. RELEVANCY OCCURENCE: This parameter measures the frequency of commonly occurring topics which are relevant to the question. 
  5. COHERENCE: This metric essentially captures the logical flow within the text, meaning how well the sentences and ideas relate to adjacent or nearby sentences. 
Scores on each of the above parameters are calculated separately and stored before generating the final scoring. 

Note – All the parameters above have a score in the range of 0-1 

Scoring and confidence Logic 

With Reference Answer:  
All the above stated parameters can be used when evaluating with respect to reference text, and their weightages can be set as per client. 

Without Reference Answer: 
While evaluating text scenarios without reference, the following parameters can be used: topic similarity, length criteria similarity, grammar and spell check, relevancy occurrence, and coherence. The weightages for each can be set as per the client's requirements. 

Please refer to below example for calculation: 

Score Calculation 

With Reference Text: 

Overall Marks: 
Let us suppose maximum marks for a question is – 10 marks 

Scores for the parameters mentioned are as follows:
  1. Cosine Similarity – 0.8 (0-1) 
  2. Topic Similarity – 0.75 (0-1) 
  3. Summary Cosine Similarity – 0.68 (0-1) 
  4. Length Criteria Similarity – 0.6 ({response word count}/ {ideal answer word count}) 
  5. Semantic Similarity – 0.56(0-1) 
  6. Grammar & Spell Score – 0.25(0-1) 
  7. Relevancy Occurrence Score – 0.5(0-1) 
  8. Coherence – 0.75(0-1) 
Weightage Assigned to each parameter: 
  1. Weightage of cosine similarity – 10% 
  2. Weightage of topic similarity – 20% 
  3. Weightage of summary cosine similarity – 15% 
  4. Weightage of length similarity – 10% 
  5. Weightage of semantic similarity – 20% 
  6. Weightage of grammar and spell– 5% 
  7. Weightage of relevancy occurrence – 10% 
  8. Weightage of coherence – 10% 
Max marks for each parameter: 
  1. Max marks for cosine similarity according to weightage - (10/100) *10 = 1 
  2. Max marks for topic similarity according to weightage - (20/100) *10 = 2 
  3. Max marks for summary cosine similarity according to weightage - (15/100) *10 = 1.5 
  4. Max marks for length similarity according to weightage - (10/100) *10 = 1 
  5. Max marks for semantic similarity according to weightage - (20/100) *10 = 2 
  6. Max marks for grammar and spell according to weightage - (5/100) *10 = 0.5 
  7. Max marks for relevancy occurrence according to weightage - (10/100) *10 = 1 
  8. Max marks for coherence according to weightage -(10/100) *10 = 1 
Score of each parameter: 
  1. Cosine similarity out of its max marks = 0.8*1 = 0.8 
  2. Topic similarity out of its max marks = 0.75*2 = 1.5 
  3. Summary Cosine similarity out of its max marks = 0.68*1.5 = 1.02 
  4. Length similarity out of its max marks = 0.7* 1 = 0.7 
  5. Semantic similarity out of its max marks = 0.56*2 = 1.12 
  6. Grammar and spell out of its max marks = 0.25*0.5 = 0.125 
  7. Relevancy occurrence out of its max marks = 0.5*1 = 0.5 
  8. Coherence out of its max marks = 0.75* 1 = 0.75 
Note – The length similarity is taken as 0.7 as 0.1 extra advantage is given to candidate as not always the candidate response maybe as lengthy as the ideal answer but maybe its quality could be good.

Overall marks = (6.515/10)

AI Confidence calculation

AI confidence is calculated based on number of parameters selected and the following confidence thresholds 

Confidence Thresholds - 
  1. Low Confidence - (0-0.3) 
  2. Medium Confidence - (0.3-0.7) 
  3. High Confidence - (0.7-1) 
Confidence Calculation General Rules: 
  1. Whichever confidence counts has the highest frequency, then that confidence is allotted to the response. 
  2. If either two confidence counts or all three confidence counts are equal, then in that case medium confidence is allotted logically. 
According to above example the confidence calculation will be as follows: 
  1. Cosine AI Confidence – High (0.8) 
  2. Topic AI Confidence – High (0.75) 
  3. Summary AI Confidence – Medium (0.68) 
  4. Length AI Confidence – Medium (0.6) 
  5. Semantic AI Confidence – Medium (0.56) 
  6. Grammar and Spell AI Confidence – Low (0.25) 
  7. Relevancy AI Confidence – Medium (0.5) 
  8. Coherence AI Confidence – High (0.75)
Since Medium confidence count is the highest, the final confidence is Medium. 

Note:  Final Confidence, 2 stands for high, 1 stands for medium and 0 stands for low confidence 

So, the final scoring is as follows -  
Overall Marks = 6.515/10
AI Confidence = Medium

Without Reference Text:  
Overall Marks:  
Let us suppose maximum marks for a question is – 10 marks 

Scores for the parameters mentioned are as follows: 
  1. Topic Similarity – 0.75 (0-1) 
  2. Length Criteria Similarity – 0.75 ({response word count}) 
  3. Grammar & Spell Score – 0.25(0-1) 
  4. Relevancy Occurrence Score – 0.5(0-1) 
  5. Coherence – 0.75(0-1) 
Weightage Assigned to each parameter: 
  1. Weightage of topic similarity – 30% 
  2. Weightage of length similarity – 20% 
  3. Weightage of grammar and spell– 10% 
  4. Weightage of relevancy occurrence – 20% 
  5. Weightage of coherence – 20% 
Max marks for each parameter: 
  1. Max marks for topic similarity according to weightage - (30/100) *10 = 3 
  2. Max marks for length similarity according to weightage - (20/100) *10 = 2 
  3. Max marks for grammar and spell according to weightage - (10/100) *10 = 1 
  4. Max marks for relevancy occurrence according to weightage - (20/100) *10 = 2 
  5. Max marks for coherence according to weightage -(20/100) *10 = 2 
Score of each parameter: 
  1. Topic similarity out of its max marks = 0.75*3 = 2.25 
  2. Length similarity out of its max marks = 0.75*2 = 1.5 
  3. Grammar and spell out of its max marks = 0.25*1 = 0.25 
  4. Relevancy occurrence out of its max marks = 0.5*2 = 1 
  5. Coherence out of its max marks = 0.75*2 = 1.5 
Overall marks = (6.5/10

AI Confidence calculation 

AI confidence is calculated based on number of parameters selected and the following confidence thresholds 

Confidence Thresholds: 
  1. Low Confidence - (0-0.3) 
  2. Medium Confidence - (0.3-0.7) 
  3. High Confidence - (0.7-1) 
Confidence Calculation General Rules: 
  1. Whichever confidence counts has the highest frequency, then that confidence is allotted to the response. 
  2. If either two confidence counts or all three confidence counts are equal, then in that case medium confidence is allotted logically. 
According to above example the confidence calculation will be as follows: 
  1. Topic AI Confidence – High (0.75) 
  2. Length AI Confidence – High (0.75) 
  3. Grammar and Spell AI Confidence – Low (0.25) 
  4. Relevancy AI Confidence – Medium (0.5) 
  5. Coherence AI Confidence – High (0.75) 
Since High confidence count is the highest, the final confidence is High. 

Note:  Final Confidence, 2 stands for high, 1 stands for medium and 0 stands for low confidence 

So, the final scoring is as follows -  
Overall Marks = 6.5/10 
AI Confidence = High