magpub: import and deduplicate academic publications
magpub is a Python library for fetching academic publications from open scholarly databases and detecting duplicates across sources.
It is used by the Chameleon Cloud Portal to track research outputs that cite the testbed, but the library itself is Django-agnostic and can be embedded in any Python project.
Installation
pip install magpub
Optional extras:
# Scopus and ScienceDirect support (requires pybliometrics)
pip install magpub[scopus]
# Documentation build dependencies
pip install magpub[docs]
Quick start
Searching a source
from magpub.sources.scopus import ScopusClient
client = ScopusClient(
api_key="your-api-key",
institution_token="your-token",
)
for pub in client.search('TITLE("Chameleon Cloud")'):
print(pub.title, pub.doi)
Deduplicating publications
from magpub.deduplicate import find_matches
from magpub.models import PublicationData
existing = [PublicationData(title="Hello", doi="10.1234/a", year=2020)]
new_pub = PublicationData(title="Hello", doi="10.1234/a", year=2020)
matches = find_matches(new_pub, existing)
print(f"Found {len(matches)} duplicate(s).")
Working with BibTeX
from magpub.utils import get_pub_type, get_forum, get_link, get_month
entry = {
"ENTRYTYPE": "article",
"title": "My Paper",
"journal": "Nature",
"year": "2024",
"month": "Mar",
"doi": "10.1234/example",
}
print(get_pub_type(entry)) # "journal article"
print(get_forum(entry)) # "Nature"
print(get_link(entry)) # "https://doi.org/10.1234/example"
print(get_month(entry)) # 3