Extracting Structured Data from PDF Files using OpenAI & Langchain
Aim The aim of this project is to create a paper parser that extracts questions from an exam file and parses it into JSON format leveraging langchain and openAI. Technologies Used:- The code is written in JavaScript and utilizes the following technologies: dotenv: A module for loading environment variables from a .env file. zod: A TypeScript-first schema validation library. Langchain : A framework designed to simplify the creation of applications using large language models. fs: A built-in Node.js module for working with the file system. Key Challenges:- Here are the key challenges that we faced in the code and some tips to overcome them: 1. Writing the Prompt: One challenge was writing a prompt that guides the OpenAI language model to extract the desired information accurately without making up questions or generating irrelevant responses. To address this challenge, here are some tips: Clearly specify the format and structure of the expected output. Include explicit instructions