We compare seven established risk elicitation methods and investigate how they explain an extensive set of risky behavior from a large household survey. We find overall positive correlation between items and low explanatory power in terms of behavior. Using an average of seven risk elicitation methods reduces measurement noise and yields more predictive power. A reduced set of risk items yields the same external validity as the average of all seven methods. Hence, our multiple-item risk measures offer a more reliable way to measure risk preferences. Our results caution against the reliability of one risk method alone due to noise.