glossagen.utils package¶
Submodules¶
glossagen.utils.dspy_utils module¶
init dspy.
- glossagen.utils.dspy_utils.init_dspy(language_model_class: ~dsp.modules.gpt3.GPT3 = <class 'dsp.modules.gpt3.GPT3'>, max_tokens: int = 3000, model: str = 'gpt-3.5-turbo') None [source]¶
Initialize the dspy library with the specified parameters.
- Args:
language_model_class: The class of the language model to use. max_tokens (int): The maximum number of tokens to generate. model (str): The name of the language model to use.
Returns¶
None
glossagen.utils.pdf_utils module¶
Base classes for document extraction.
- class glossagen.utils.pdf_utils.MetadataSignature(*, publication_text: str, title: str, doi: str)[source]¶
Bases:
Signature
Extracts metadata from the publication text.
- doi: str¶
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'doi': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Digital Object Identifier (DOI) of the publication.', '__dspy_field_type': 'output', 'prefix': 'Doi:'}), 'publication_text': FieldInfo(annotation=str, required=True, json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Publication Text:', 'desc': '${publication_text}'}), 'title': FieldInfo(annotation=str, required=True, json_schema_extra={'desc': 'Title of the publication.', '__dspy_field_type': 'output', 'prefix': 'Title:'})}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- publication_text: str¶
- title: str¶
- class glossagen.utils.pdf_utils.ResearchDoc(*, doc_src: str, fitz_paper: Document | None = None, paper: str = '', metadata_dict: Dict[str, str] = {})[source]¶
Bases:
BaseModel
A research paper.
- class Config[source]¶
Bases:
object
Pydantic configuration for the ResearchDoc class.
- arbitrary_types_allowed = True¶
- doc_src: str¶
- extract_metadata() None [source]¶
Extract metadata from the research paper.
This method uses a metadata extractor to extract the title and DOI from the paper. The extracted metadata is stored in the metadata_dict attribute of the ResearchDoc instance.
- fitz_paper: Document | None¶
- classmethod from_dir(paper_dir: str) ResearchDoc [source]¶
Create a ResearchDoc instance from dir containing a research paper.
- Args:
paper_dir (str): The dir path containing the research paper.
Returns¶
ResearchDoc: The created ResearchDoc instance.
- classmethod from_text(text: str, doc_src: str) ResearchDoc [source]¶
Create a ResearchDoc instance from text.
- Args:
text (str): The text of the research paper. doc_src (str): The source of the document.
Returns¶
ResearchDoc: The created ResearchDoc instance.
- metadata_dict: Dict[str, str]¶
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'doc_src': FieldInfo(annotation=str, required=True), 'fitz_paper': FieldInfo(annotation=Union[Document, NoneType], required=False, default=None), 'metadata_dict': FieldInfo(annotation=Dict[str, str], required=False, default={}), 'paper': FieldInfo(annotation=str, required=False, default='')}¶
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- paper: str¶
- class glossagen.utils.pdf_utils.ResearchDocLoader(directory: str)[source]¶
Bases:
object
A class for loading research documents from a directory.
- load() ResearchDoc [source]¶
Load the research document from the specified directory.
Returns¶
ResearchDoc: The loaded research document.
Raises¶
FileNotFoundError: If the required file ‘paper.pdf’ is not found in the directory.