» In particular, who can come up with a quantitative relevant measure of the difference any estimator makes if you have two model alternatives?

» One thing is of course to judge if estimate A is closer to the true value that estimator B, or if its variance is smaller, …

$$\small{\delta_{\,estimate-true}\small}$$ or – likely better – $$\small{|\,\textrm{RE}\,(\%)\,|=|\,100(estimate-true)/true\,|}$$ is commonly used.
Example for $$\small{\sigma_\textrm{wR}^2=0.2025}$$ of the paper’s Table 2:
$$\textbf{Table I:}\;\textrm{Comparison of models' estimates}$$$$\small{\begin{array}{cccccccl} \hline \text{Scenario} & \hat\sigma_\textrm{wR,REML}^2 & \delta & |\,RE\,(\%)\,| & \hat\sigma_\textrm{wR,Lin.model}^2 & \delta & |\,RE\,(\%)\,| & \textrm{Comparison of }|\,RE\,(\%)\,|\\ \hline 1 & 0.2023 & -0.0002 & 0.0988 & 0.2024 & -0.0001 & 0.0494 & \text{Linear model "better"} \\ 2 & 0.2036 & +0.0011 & 0.5432 & 0.2036 & +0.0011 & 0.5432 & \text{Equal} \\ 3 & 0.2014 & -0.0011 & 0.5432 & 0.2013 & -0.0012 & 0.5926 & \text{REML "better"} \\ 4 & 0.2022 & -0.0003 & 0.1481 & 0.2023 & -0.0002 & 0.0988 & \text{Linear model "better"} \\ 5 & 0.2025 & \pm 0.0000 & 0.0000 & 0.2024 & -0.0001 & 0.0494 & \text{REML "better"} \\ 6 & 0.2027 & +0.0002 & 0.0988 & 0.2027 & +0.0002 & 0.0988 & \text{Equal} \\ \hline \end{array}}$$There is no clear winner. IMHO, it boils down to the question which of the scenarios is most likely occurring in practice. No idea.

We can look at the expanded limits $$\small{\left\{L\,,U\right\}}=100\exp(\mp0.76\,\hat{\sigma}_\textrm{wR})$$ and the back-calculated ‘clinically not relevant difference’ $$\small{\Delta^{\star}=100-L}$$ as well:
$$\textbf{Table II:}\;\textrm{Comparison of ABEL}$$$$\small{\begin{array}{ccccccc} \hline \text{Scenario} & \left\{\textit{L},\,\textit{U}\right\}_\textrm{REML} & \Delta^{\star} & \left\{\textit{L},\,\textit{U}\right\}_\textrm{LM} & \Delta^{\star} & \left\{\textit{L}-\textit{U}\right\} & \Delta^{\star} \\ \hline 1 & 71.05,\,140.75 & 28.95\% & 71.04,\,140.76 & 28.96\% & \textrm{REML}>\textrm{LM} & \textrm{REML}<\textrm{LM}\\ 2 & 70.97,\,140.91 & 29.03\% & 70.97,\,140.91 & 29.03\% & \textrm{REML}=\textrm{LM} & \textrm{REML}=\textrm{LM}\\ 3 & 70.10,\,140.65 & 28.90\% & 71.11,\,140.63 & 28.89\% & \textrm{REML}>\textrm{LM} & \textrm{REML}>\textrm{LM}\\ 4 & 71.05,\,140.74 & 28.95\% & 71.05,\,140.75 & 28.95\% & \textrm{REML}=\textrm{LM} & \textrm{REML}=\textrm{LM}\\ 5 & 71.03,\,140.78 & 28.97\% & 71.04,\,140.76 & 28.96\% & \textrm{REML}>\textrm{LM} & \textrm{REML}>\textrm{LM}\\ 6 & 71.02,\,140.80 & 28.98\% & 71.02,\,140.80 & 28.98\% & \textrm{REML}=\textrm{LM} & \textrm{REML}=\textrm{LM}\\ \hline \end{array}}$$A sponsor would prefer $$\small{\left\{L\,,U\right\}}$$ as wide as possible. Hence, $$\small{\textrm{REML}}$$ is the way to go. From the patient’s – and therefore, regulatory? – perspective it is obviously the other way ’round $$\small{(\Delta^{\star}}$$ as small as possible). Bonus question: What about the Type I Error? Of course, discussed in the paper…

» … but another is to judge practical relevance. I was told that it is the latter that is of interest to the author.

Understandable. However, even if not practically relevant I prefer the “better” one. The paper’s Table 4 is interesting.

