glossagen.utils package¶

Submodules¶

glossagen.utils.dspy_utils module¶

init dspy.

glossagen.utils.dspy_utils.init_dspy(language_model_class: ~dsp.modules.gpt3.GPT3 = <class 'dsp.modules.gpt3.GPT3'>, max_tokens: int = 3000, model: str = 'gpt-3.5-turbo') → None[source]¶

Initialize the dspy library with the specified parameters.

Args:: language_model_class: The class of the language model to use. max_tokens (int): The maximum number of tokens to generate. model (str): The name of the language model to use.

Returns¶

None

glossagen.utils.pdf_utils module¶

Base classes for document extraction.

class glossagen.utils.pdf_utils.MetadataSignature(*, publication_text: str, title: str, doi: str)[source]¶

Bases: Signature

Extracts metadata from the publication text.

doi: str¶

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'doi': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Digital Object Identifier (DOI) of the publication.', '__dspy_field_type': 'output', 'prefix': 'Doi:'}), 'publication_text': FieldInfo(annotation=str, required=True, json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Publication Text:', 'desc': '${publication_text}'}), 'title': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Title of the publication.', '__dspy_field_type': 'output', 'prefix': 'Title:'})}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

publication_text: str¶

title: str¶

class glossagen.utils.pdf_utils.ResearchDoc(*, doc_src: str, fitz_paper: Document | None = None, paper: str = '', metadata_dict: Dict[str, str] = {})[source]¶

Bases: BaseModel

A research paper.

class Config[source]¶

Bases: object

Pydantic configuration for the ResearchDoc class.

arbitrary_types_allowed = True¶

doc_src: str¶

extract_metadata() → None[source]¶

Extract metadata from the research paper.

This method uses a metadata extractor to extract the title and DOI from the paper. The extracted metadata is stored in the metadata_dict attribute of the ResearchDoc instance.

fitz_paper: Document | None¶

classmethod from_dir(paper_dir: str) → ResearchDoc[source]¶

Create a ResearchDoc instance from dir containing a research paper.

Args:: paper_dir (str): The dir path containing the research paper.

Returns¶

ResearchDoc: The created ResearchDoc instance.

classmethod from_text(text: str, doc_src: str) → ResearchDoc[source]¶

Create a ResearchDoc instance from text.

Args:: text (str): The text of the research paper. doc_src (str): The source of the document.

Returns¶

ResearchDoc: The created ResearchDoc instance.

metadata_dict: Dict[str, str]¶

model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'doc_src': FieldInfo(annotation=str, required=True), 'fitz_paper': FieldInfo(annotation=Union[Document, NoneType], required=False, default=None), 'metadata_dict': FieldInfo(annotation=Dict[str, str], required=False, default={}), 'paper': FieldInfo(annotation=str, required=False, default='')}¶

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

paper: str¶

trim_at_references() → None[source]¶: Trim the document text at the start of the references section.

class glossagen.utils.pdf_utils.ResearchDocLoader(directory: str)[source]¶

Bases: object

A class for loading research documents from a directory.

load() → ResearchDoc[source]¶: Load the research document from the specified directory.

Returns¶

ResearchDoc: The loaded research document.

Raises¶

FileNotFoundError: If the required file ‘paper.pdf’ is not found in the directory.

glossagen.utils.pdf_utils.main() → None[source]¶: Demonstrate the usage of the ResearchDoc class.

glossagen.utils package¶

Submodules¶

glossagen.utils.dspy_utils module¶

Returns¶

glossagen.utils.pdf_utils module¶

Returns¶

Returns¶

Returns¶

Raises¶

Module contents¶