Code style: ruff-format Coverage Status

GlossaGen


creating a glossary and knowledge graph out of scholarly materials and chemistry reviews – instantly

This project is part of the 2024 LLM Hackathon for Materials and Chemistry. Find the public submission of our project – including a product demo – here. Thank you for leaving a ❤️, comment, repost or star!

Curious about example outputs of GlossaGen? Check out an intermediate Weights&Biases report here.

🔥 Usage

Run GlossaGen to extract a glossary table from the command line:

glossagen # runs the program with the default paper
glossagen path/to/directory/containing/paper # the paper must be called paper.pdf

👩‍💻 Installation

Create a new environment and install the package:

conda create -n glossagen python=3.10
conda activate glossagen
pip install -e .

IMPORTANT: Make sure you have a .env file in your project directory with an OPENAI_API_KEY.

# content of the .env file
OPENAI_API_KEY=sk-foo

# if you plan to generate knowledge graphs, provide Neo4J and Groq Credentials
NEO4J_URI=neo4j+s://foo
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=foo
GROQ_API_KEY=gsk_foo

🛠️ Development installation

To install, run

(glossagen) $ pip install -e ".[test,doc]"

Run style checks, coverage, and tests

(glossagen) $ pip install tox
(glossagen) $ tox

Generate coverage badge

Works after running tox

(glossagen) $ pip install "genbadge[coverage]"
(glossagen) $ genbadge coverage -i coverage.xml

Indices and tables