We Tested 5 AI Tools for Underwriting. Here's What Actually Worked.
The market is flooded with AI tools promising to "revolutionize" real estate. But when you give them a messy 150-page OM, a scanned T-12, and a confusing Rent Roll, what happens? We ran the tests. Here is an honest assessment of what works, what hallucinates, and where the limitations lie.
The Methodology
We took a real-world, mid-market multifamily deal (names redacted). The data package included:
- A 40-page Offering Memorandum (PDF)
- A Trailing 12-Month Operating Statement (scanned PDF, slightly skewed)
- A Current Rent Roll (Excel export saved as PDF)
We tested five tools:
- ChatGPT (GPT-4) - The generalist benchmark
- Claude 3.5 Sonnet - Known for long-context windows
- PropTech Tool A - A specialized, venture-backed startup
- PropTech Tool B - An established legacy vendor adding "AI features"
- REMI - Our platform (we will be objective, even about our flaws)
The Test: Extraction & Conflict Detection
The goal was simple: Extract the stabilized Year 1 NOI, find the average rent per square foot, and cross-reference the occupancy stated in the OM with the actual Rent Roll.
| Tool | NOI Extraction | Rent Roll Accuracy | Conflict Detection |
|---|---|---|---|
| ChatGPT (GPT-4) | Good | Hallucinated | Failed |
| Claude 3.5 Sonnet | Excellent | Good | Missed nuance |
| PropTech Tool A | Excellent | Slow | Good |
| PropTech Tool B | Clunky UI | Failed OCR | Not supported |
| REMI | Perfect (Cited) | Excellent | Flagged 3% delta |
The Generalists (GPT-4 & Claude)
The Good: They are incredibly fast and cost next to nothing. If you just need a quick summary of a clean PDF, they work. Claude handled the long context of the OM beautifully.
The Bad: They fail at OCR (Optical Character Recognition) on messy, scanned financials. ChatGPT hallucinated rent roll totals by misaligning columns. Crucially, they don't provide reliable citations, making the output dangerous to use in an IC Memo without manual verification.
The Specialists (PropTech A & B)
The Good: Tool A had specific real estate ontology built in. It understood that "GPR" meant Gross Potential Rent.
The Bad: Tool B felt like a wrapper around an old OCR engine. It struggled with the skewed T-12. Tool A was accurate but required a clunky mapping process before it could ingest the data.
REMI: An Objective Assessment
What worked: REMI excelled because it is built for this specific workflow. It didn't just extract the NOI; it linked directly to the cell in the T-12. It successfully cross-referenced the OM (claiming 95% occupancy) with the Rent Roll (which showed 92% economic occupancy) and flagged the 3% delta for review.
Our Limitations: REMI currently requires files to be uploaded in standard formats (PDF, Excel). If you have highly obscure, proprietary accounting exports, you may need to map them once before the system learns the format.
Build vs. Buy: The Real Cost
Many GPs look at GPT-4's $20/month price tag and think, "We'll just build our own tool using the API."
The cost of building a secure, cited, SOC2-compliant extraction engine that handles real estate nuances is north of $500,000/year in data science and engineering salaries. REMI costs between $60-$175/month per user. The "Build vs. Buy" math for AI in 2026 heavily favors "Buy."
Download "The GP's AI Toolkit"
Get the full, unredacted 20-page report detailing our tests, including pricing, use cases, and specific prompts to use when testing tools yourself.