TAB Dataset — GPT-4o — 10 documents
| Metric | Level 1 (Naive) | Level 2 (Intermediate) | Level 3 (CoT Expert) | Level 3_fix1 (CoT Expert) |
|---|---|---|---|---|
| Overall Recall | 93.1% | 98.5% | 95.0% | 96.9% |
| Word Retention | 79.9% | 79.2% | 79.9% | 79.5% |
| Structure Similarity | 100.0% | 100.0% | 100.0% | 100.0% |
| Entities Masked | 446 | 472 | 455 | 464 |
| Entities Missed | 33 | 7 | 24 | 15 |