Anders Eriksson suggested a nice way of testing whether our inference methods do well or poorly for different heights in the TS.
We use the (infinite sites) mutations to identify corresponding edges in the true and the inferred TS. Then (since we are guaranteed that the tips under each are the same), we can calculate a topology difference between the subtrees rooted at that node.