What AI does well, briefly.
For context. Auto-coding repeating transactions: 92-98 percent accuracy after 60 days of learning per organization. OCR extraction from clean PDF invoices: 88-95 percent accuracy on key fields. Bank statement reconciliation: 90+ percent automatic match rate on standard formats. Anomaly detection on AP/AR: catches duplicates, off-pattern vendors, unusual amounts with around 12 percent false-positive rate. NL queries against the ledger: works well for read-only reporting.
These wins are real and shift the bookkeeper role from typist to reviewer. But they are not the whole job, and the rest of this post is about what the AI does not do.
Limit 1: judgement calls about contracts.
When a $50,000 payment comes in, is it a deposit on machinery, a prepaid expense, an advance from a customer, or revenue? The transaction does not say. The contract does. The AI can guess based on counterparty and amount but the call belongs to a person who has read the contract.
For SMBs with significant project-based or contract-based revenue, this means an experienced person reviews any non-routine payment above a threshold (often $5,000-$10,000). The AI flags it for review; the person makes the call. This will likely remain true for the foreseeable future because the source data (contracts) lives outside the accounting system.
Limit 2: tax-policy decisions.
Whether a US vendor is a 1099-reportable contractor or a corporation is checkable against a maintained W-9 record — but the classification edge cases (single-member LLCs, attorneys, medical providers, reportable vs not), the documentation needed, and the cross-border edge cases (treaty rates, FATCA, ECI vs FDAP) require human tax knowledge. The same applies to UK CIS, Australian PAYG-W contractor cases, Canadian T4A reporting, and similar regimes elsewhere. The AI applies rules; tax policy is interpretation.
Concretely: Nonari maintains the withholding-tax rate logic and integrates with vendor classification records. The AI applies these correctly for routine cases. But a foreign technology vendor under a tax treaty, or a contractor in a special sector, or a payment with mixed components — these require a CPA, chartered accountant, or tax consultant. The boundary is "rule application" (AI) versus "rule interpretation" (human).
Limit 3: messy unstructured input.
Handwritten receipts in non-Latin scripts, photos of crumpled paper receipts, faxed invoices from rural suppliers, voice notes describing a payment, low-resource languages — all still problematic for AI extraction. OCR handles printed receipts in major languages reasonably well but handwritten content in low-resource or non-Latin scripts remains weak (40-60 percent accuracy depending on hand and language). Photos of paper receipts at angles in poor light: usable but error-prone.
The practical workaround: structure the inputs upstream. Require vendors to send PDF invoices via email where possible. Use a flat-bed scanner at end-of-day for paper receipts (much better than phone photos). Provide a simple form for cash receipts rather than describing them in chat messages. The AI gets dramatically better with structured input; do not ask it to do magic on unstructured input.
Limit 4: cut-off and accruals at month-end.
At month-end, deciding which expenses belong in the current period (accrue) versus next period requires judgement about when goods/services were received, not just when invoices arrived. The AI can suggest based on historical patterns but the call requires knowledge of what actually happened operationally. A consultant who delivered work on March 30 but invoices on April 5: April invoice, March expense.
For SMBs running 5+ day monthly closes, accruals are typically 5-10 line items requiring human decision. The AI does not eliminate this work but the cleaner the books are during the month, the cleaner the accruals process at month-end. Plan for human time on accruals; do not expect them to disappear.
Limit 5: year-end audit judgement.
Audit decisions — bad debt provisions, inventory write-downs, asset impairments, contingency provisions, related-party policy compliance — all require judgement that incorporates information outside the ledger. Customer relationships, market conditions, future plans, regulatory environment: none of these are in the transaction data. AI cannot synthesize them.
The auditor (and your CPA / chartered accountant preparing for audit) will not accept "the AI decided" as justification for these provisions. Plan for a traditional year-end with human-driven adjusting entries, supported by clean ledger data. This is true for the foreseeable future.
Limit 6: cross-system data quality.
The AI is only as good as the data it sees. If your inventory system is out of sync with the GL, the AI cannot detect that. If your POS is recording sales to a wrong account, the AI may learn to apply the same wrong account. If a vendor master has the wrong tax ID, the withholding calculation will be wrong. Garbage-in, garbage-out remains a hard constraint.
Practical implication: invest in data hygiene continuously. Vendor and customer master quality is foundational. Inventory accuracy is foundational. Bank account configuration is foundational. The AI bookkeeper amplifies the quality of underlying data, including the bad parts. Quarterly data hygiene reviews are not optional.
Limit 7: regulatory edge cases.
New IRS revenue procedures, HMRC technical notes, ATO rulings, CRA interpretation bulletins, EU VAT directives, special-economic-zone rules — these are out of scope for current AI bookkeepers. The AI applies the rules it was trained on; new rules require human update of the rule base. For most SMBs this is fine because routine rules are stable. For SMBs in heavily regulated sectors or special zones, expect manual intervention.
Concrete example: a 2026 jurisdiction-specific notification on supply-chain withholding for specific industries took 6 weeks for accounting tools to incorporate, with manual workarounds in between. Plan for this kind of lag in any AI-driven workflow. The AI handles 95 percent of cases; humans handle the leading-edge 5 percent until the AI catches up.
- Contract judgement for non-routine large transactions
- Tax policy interpretation (vs rule application)
- Handwritten receipts in non-Latin scripts / low-resource languages
- Cut-off and accrual decisions at month-end
- Year-end audit judgement (provisions, write-downs)
- Cross-system data quality (garbage-in)
- Newly-issued regulatory edge cases
How Nonari frames its limits.
Nonari surfaces uncertainty explicitly. When the classifier is below 85 percent confident, the transaction goes to a review queue rather than being silently coded. When OCR extraction is uncertain, fields are flagged for review. The audit log records every AI suggestion and every human override, so a year later you can see exactly what was machine-decided and what was human-decided.
This honesty matters because the alternative — silent low-confidence decisions — produces compounding errors that nobody catches until a year-end audit. The right design philosophy is "high confidence: automate; low confidence: queue for human." Vendors that promise full automation without acknowledging the limits are selling marketing, not capability. Use that as a screening criterion when evaluating any AI bookkeeping tool.