magpub.deduplicate

Django-agnostic deduplication logic for publications.

Functions operate on PublicationData (or any duck-typed object with .title, .doi, .source_id, .year attributes).

Functions

find_matches(candidate, existing, *[, ...])

Return existing publications that match the candidate.

group_duplicates(publications[, ...])

Return a list of (duplicate, originals) tuples for likely duplicates.

is_duplicate(pub1, pub2[, similarity_threshold])

Return True if pub1 and pub2 are likely duplicates.