Method issues | Resolution/mitigation strategies |
---|---|
1. The place characteristics data is clustered in place units. | Use random forest procedure that handles clustering, specifically when drawing subsamples to grow trees and when computing out-of-bag (OOB) predictions. |
2. Places are connected by individuals. | Manual leave-one-out (LOO) procedure where prediction for each census tract is based on a model fit to data where not only that census tract is excluded but also all persons ever seen in that census tract are excluded. |
3. Outcome data is clustered in individuals and unbalanced. | “Rough balancing”: sample a maximum of 3 time points (years) per individual and 1 place per individual-year. |
4. Some place variables are not available for all years. | Attempt using as proxy the adjacent year version of the variable. If mean square error is not improved, revert back. |
5. There are time trends in the data. | Remove time trends by standardizing the predictions for the census tracts within each year (to mean 0, variance 1), and use this as the V-score. |