AI Flags Plagiarism
Industry
Government
Client
NMA
Team
BUILD
Year
2024
We were hired to build an AI system that catches organized fraud in 8,000 annual funding applications while speeding up legitimate approvals.

Challenge
NMA receives 8,000+ applications annually. Manual review across mixed, often unstructured formats makes it hard to detect duplicate financing, plagiarism, and organized fraud in time.
Approach
Built an ingestion pipeline that converts PDF/Word/JSON (with OCR) into structured data for analysis.
Deployed advanced BERT-based NLP models to analyze application text structure, vocabulary, and writing style patterns, automatically detecting similarities and linguistic fingerprints across thousands of documents.
Trained ML models on historical fraud to score risk using anomaly indicators (e.g., unusual financial patterns, project inconsistencies, and coordinated submission schemes through anomaly detection).
Generated standard project profiles and dashboards that visualize similarities and risk for investigator triage.
Deployed fully within the client’s infrastructure with AES-256 encryption and RBAC; no external services.
Architecture/Backend
Data extraction from JSON/PDF/Word with OCR and tokenization.
AI/ML models
BERT-based NLP for text similarity/pattern analysis of structure, vocabulary, and style.
Infrastructure/Deployment
On-premise deployment inside client's secure infrastructure; AES-256 encryption.
Key results
Automates screening for 8,000+ applications/year, surfacing high-risk cases for focused human review.
Accelerates detection of plagiarism and fraud by combining text-similarity analysis with anomaly-based risk scoring.
Strengthens governance and compliance via on-prem processing, encryption, and role-based access control; architecture designed to scale to multi-agency use.
Secure on-premises deployment with AES-256 encryption ensures full data protection and regulatory compliance while maintaining complete control over sensitive agricultural funding information.