Identification of Knowledge Gaps in Text

2018 - 2019

Graduate Thesis, Supervised by Dr. Sudarshan Iyengar Associate Prof., Department of Computer Science and Engineering, Institute of Technology, IIT Ropar on prediction of a knowledge gap in the text is a long-standing challenge, defined as the identification of regions in the text where the information presented lacks clarity or coherence. An automated pipeline to predict such gaps would be high applicability in generating feedback and refining the text for school books, education, learning, and other areas of documentation and research.

Developed novel approach for decomposing the problem of knowledge gap into two separate tasks, first segmenting a given text document and then predicting the external and internal gap.

  1. For segmenting a given text such that a given segment consists of all consecutive sentences of similar context is based on semantic similarity of sentences.
  2. The next task of predicting the internal gap for a given segment is modelled from the perspective of ease of readability, and various metrics are contrasted for same. Predicting an external gap between the two pieces of segmented text works on evaluating the generated probability distribution of topics.

Worked on Distributed Computing at Cryptography and Information Security (CrIS) Lab under Dr. Arpita Patra, Assistant Prof., Department of Computer Science and Automation, Indian Institute of Science.

Task was to analyze existing protocols in Broadcast and Byzantine Agreement starting with Exponential Information Gathering given by Pease, Shostak, and Lamport. We then move on to Polynomial time algorithms and then optimizations in communication and rounds. We then obtain protocol based on analysis of information which combines techniques like early stopping, fault masking and the coordinated traversal presented by Yoram Moses and Orli Waarts. The effect is a greatly distributed and asynchronous protocol which greatly restrict the effect of faulty processors in a distributed setting. It combines early stopping with fault masking and the new coordinated traversal technique, which together greatly restrict the amount and type of damage a faulty processor can cause.


