Most papers attack one of TIGER's four fixed choices. This plan follows the reasoning branch — the seven clusters below. The live debate: does explicit CoT actually help, or do fine-grained rewards and latent reasoning win?
0 / 60 read
Where to push. Three forces clash — SIDReasoner says explicit reasoning + outcome reward helps; SAPO says the reward is too coarse and needs hierarchical credit; Why Thinking Hurts / PAUSEREC say explicit CoT can hurt. Closest to your TIGER code: fine-grained credit assignment over hierarchical SIDs, or latent / process-reward reasoning.