glossagen.utils package


glossagen.utils.dspy_utils module

init dspy.

glossagen.utils.dspy_utils.init_dspy(language_model_class: ~dsp.modules.gpt3.GPT3 = <class 'dsp.modules.gpt3.GPT3'>, max_tokens: int = 3000, model: str = 'gpt-3.5-turbo') None[source]

Initialize the dspy library with the specified parameters.


language_model_class: The class of the language model to use. max_tokens (int): The maximum number of tokens to generate. model (str): The name of the language model to use.



glossagen.utils.pdf_utils module

Base classes for document extraction.

class glossagen.utils.pdf_utils.MetadataSignature(*, publication_text: str, title: str, doi: str)[source]

Bases: Signature

Extracts metadata from the publication text.

doi: str
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'doi': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Digital Object Identifier (DOI) of the publication.', '__dspy_field_type': 'output', 'prefix': 'Doi:'}), 'publication_text': FieldInfo(annotation=str, required=True, json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Publication Text:', 'desc': '${publication_text}'}), 'title': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Title of the publication.', '__dspy_field_type': 'output', 'prefix': 'Title:'})}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

publication_text: str
title: str
class glossagen.utils.pdf_utils.ResearchDoc(*, doc_src: str, fitz_paper: Document | None = None, paper: str = '', metadata_dict: Dict[str, str] = {})[source]

Bases: BaseModel

A research paper.

class Config[source]

Bases: object

Pydantic configuration for the ResearchDoc class.

arbitrary_types_allowed = True
doc_src: str
extract_metadata() None[source]

Extract metadata from the research paper.

This method uses a metadata extractor to extract the title and DOI from the paper. The extracted metadata is stored in the metadata_dict attribute of the ResearchDoc instance.

fitz_paper: Document | None
classmethod from_dir(paper_dir: str) ResearchDoc[source]

Create a ResearchDoc instance from dir containing a research paper.


paper_dir (str): The dir path containing the research paper.


ResearchDoc: The created ResearchDoc instance.

classmethod from_text(text: str, doc_src: str) ResearchDoc[source]

Create a ResearchDoc instance from text.


text (str): The text of the research paper. doc_src (str): The source of the document.


ResearchDoc: The created ResearchDoc instance.

metadata_dict: Dict[str, str]
model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'doc_src': FieldInfo(annotation=str, required=True), 'fitz_paper': FieldInfo(annotation=Union[Document, NoneType], required=False, default=None), 'metadata_dict': FieldInfo(annotation=Dict[str, str], required=False, default={}), 'paper': FieldInfo(annotation=str, required=False, default='')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

paper: str
trim_at_references() None[source]

Trim the document text at the start of the references section.

class glossagen.utils.pdf_utils.ResearchDocLoader(directory: str)[source]

Bases: object

A class for loading research documents from a directory.

load() ResearchDoc[source]

Load the research document from the specified directory.


ResearchDoc: The loaded research document.


FileNotFoundError: If the required file ‘paper.pdf’ is not found in the directory.

glossagen.utils.pdf_utils.main() None[source]

Demonstrate the usage of the ResearchDoc class.

Module contents