magpub: import and deduplicate academic publications

magpub is a Python library for fetching academic publications from open scholarly databases and detecting duplicates across sources.

It is used by the Chameleon Cloud Portal to track research outputs that cite the testbed, but the library itself is Django-agnostic and can be embedded in any Python project.

Installation

pip install magpub

Optional extras:

# Scopus and ScienceDirect support (requires pybliometrics)
pip install magpub[scopus]

# Documentation build dependencies
pip install magpub[docs]

Quick start

Searching a source

from magpub.sources.scopus import ScopusClient

client = ScopusClient(
    api_key="your-api-key",
    institution_token="your-token",
)
for pub in client.search('TITLE("Chameleon Cloud")'):
    print(pub.title, pub.doi)

Deduplicating publications

from magpub.deduplicate import find_matches
from magpub.models import PublicationData

existing = [PublicationData(title="Hello", doi="10.1234/a", year=2020)]
new_pub = PublicationData(title="Hello", doi="10.1234/a", year=2020)

matches = find_matches(new_pub, existing)
print(f"Found {len(matches)} duplicate(s).")

Working with BibTeX

from magpub.utils import get_pub_type, get_forum, get_link, get_month

entry = {
    "ENTRYTYPE": "article",
    "title": "My Paper",
    "journal": "Nature",
    "year": "2024",
    "month": "Mar",
    "doi": "10.1234/example",
}

print(get_pub_type(entry))   # "journal article"
print(get_forum(entry))      # "Nature"
print(get_link(entry))       # "https://doi.org/10.1234/example"
print(get_month(entry))      # 3