Predicting marijuana usage based on other drug use

Carson Ashby
2 min readFeb 4, 2021

When looking at the dataset from the national survey on drug use and health, the first thing I thought of was predicting the use of one drug based on the use of others, so I started by taking a few of the available data points to base my info off of.

The actual dataset from ICPSR has over 1300 columns, so I picked around 10 to base my findings off of. I took the ones I found most interesting, being that the respondent had used other drugs, as well as age and some other interesting pieces. This problem at its base is a classification problem, due to trying to guess either True or False on if a person has used marijuana.

The first model I tried was a logistic regression model, getting an accuracy of 77% on guessing if a person has used marijuana.

The next model I tried was a model using a Random Forest Classifier, and the accuracy I came up with it was 79%, a whopping 2% more accurate than the logistic regression, showing that in this situation it is better fit for the problem.

In conclusion, I met my own expectations with how accurately I was able to predict marijuana usage, this was an interesting experiment and a great learning experience, particularly for fitting different models.

--

--