Lobbyist classifier (English (US))
Binary sequence classifier fine-tuned to predict whether a LinkedIn-style job position (title + employer + description) corresponds to a lobbyist (1) or not (0). Trained for the project "Who Becomes a Lobbyist?" (MINISTERIALLOBBY) on Revelio/LinkedIn position text, with labels from the German Bundestag lobby register (DE) or LobbyView (US).
- Base model:
distilbert-base-uncased - Task: Sequence classification (2 labels: non-lobbyist, lobbyist)
- Max length: 256 tokens
Evaluation (5-fold CV)
- Mean F1: 0.8942 (± 0.0025)
- Fold F1 scores: [0.8954220915581689, 0.891170431211499, 0.8943089430894309, 0.8919135308246597, 0.8982282653481665]
- Training samples: 12834 (positive: 6417)
Intended use
- Research: classify past or current job positions as lobby vs non-lobby for career-path and panel analyses.
- Not for commercial use without checking compliance with LinkedIn/Revelio terms.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo_id = "cornelius/lobbyist-classifier-us"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
def predict(texts, threshold=0.95):
inp = tokenizer(texts, truncation=True, max_length=256, padding="max_length", return_tensors="pt")
with torch.no_grad():
logits = model(**inp).logits
probs = torch.softmax(logits, dim=1)
return probs[:, 1].numpy() # prob lobbyist
# Single position: title + " " + company + " " + description
text = "Senior Public Affairs Manager Acme Corp Government relations and advocacy."
prob = predict([text])[0]
print(f"P(lobbyist) = {prob:.2f}")
Citation
If you use this model, please cite the paper "Who Becomes a Lobbyist? Comparative Evidence from the US and Germany" (MINISTERIALLOBBY project, DFG).
- Downloads last month
- 40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support