I did machine learning at a startup for 3+ years. We raised a round and built some cool tech. But wasted a lot of time in the process.
Learning ML while your company’s future depends on it is not the most relaxing path to ML mastery.
In the process, I made a ton of mistakes. Along with a few successes here and there.
This is advice based on that anecdotal experience.
While most appropriate to someone who wants to solve real problems, it applies to all beginners.
Here we go.
Stay away from unsupervised learning
Stay far away. This was a huge time waster.
Despite recommendations from almost every AI PhD who advised me, multiple attempts at unsupervised learning provided zero value. I suppose there’s a lot of academic research in this area. Otherwise I can’t explain this paradox.
Unsupervised learning is training models on untagged data. It typically involves clustering. In theory, this can expose previously unknown patterns.
In contrast supervised learning learns relationships between inputs and tagged outputs. This is facilitated by learning what features are associated with what outputs.
In our case, unsupervised learning was inferior to human intuition every time.
So while there is probably some cool applications in the space, it’s definitely not the easy wins. Come back to this after you have experience elsewhere.
Skip neural networks
I’ve seen neural networks outperform traditional models, but the gains were small and the effort required was large.
Neural networks pose a few challenges, especially at the start of your career.
Iterating is slow. Your learning curve is a function of the speed at which you try new things. Neural networks typically take longer to train than traditional models. So there is less time to iterate.
Lots of data is required to avoid overfitting. Often this requires having been in business long enough to collect a significant amount of data, which most companies don’t have pre-tagged.
Too many options. While hyper-parameter combinations for a logistic regression are finite, a neural network can be configured in infinitely different ways. This rabbit hole is more likely to leave you lost and frustrated than with a solution.
Traditional ML models often perform well. Plugging an on off-the-shelf model from sklearn is often enough for an MVP. While weeks of tuning a neural net might provide a couple extra points of f1, it’s often not worth it in the beginning.
It’s hard to find a mentor. Neural networks are a strange beast. Almost anyone can tell you how they work high-level. But few people have experience using them to solve real problems. As a result, you’re probably on your own.
In conclusion, I’m not against neural networks. But use them for going from 90 to 100, rather than from 0 to 1.
Frame all problems as binary classification
Make it as easy as possible for your model to learn. The easiest problem is binary classification.
A binary classification model outputs a 1 or a 0. There either is a dog in the photo, or there isn’t.
In contrast, multi-class classification returns a 0, 1, 2 or 3, depending on whether the photo contains a dog, cat, parrot or emu.
Time and time again, I’ve had better results running multiple binary classifiers in parallel, rather than a single multi-class model that handles all cases.
The biggest gains are not from choosing the right model, but from framing a problem in the right way.
Tune your hyper-parameters
This can make a huge difference.
Hyper-parameters are model level configuration. For example, the learning rate.
Use an automated tool. There are several (ie: GridSearchCV, TPOT…).
You don’t have time to hand tune your models. Set some tuning boundaries and push your experiments to the cloud.
Pro tip. Write your code to rescue errors and save results periodically. I’ve lost results too many times when an experiment in the cloud crashed after 3 days, without any results saved.
Default hyper-parameters are rarely optimal. Tune them.
Give time frames for trying things, not for results
ML is not software engineering.
You can’t predict how long it will take to solve a problem, or if it’s even solvable. What you can do is predict how long an experiment will take.
The former will eventually get you in trouble. There’s nothing more annoying to the business side of a company than an engineer who constantly underestimating time requirements.
This is a simple point, but an important one if you’re learning ML on the job.
Always always always document your experiments
You’ll thank yourself in 6 months.
I recommend noting:
Model / architecture selection
Rough description of the data (origin, size, date, features…)
Results (ie: precision, recall, f1…).
A link to a snapshot of data (if possible)
Commentary and learnings
Don’t overthink it. A spreadsheet works great for this.
Eventually, the CEO or a new advisor will ask you to try something you’ve already tried. But you won’t remember why it failed the last time. Being able to lookup and present past results will save you a ton of time and annoyance.
Writing post-mortems (and the occasional debriefing on success) will also supercharge you’re learning. It will help you see patterns and build your intuition. This is what makes you “senior talent” over the long run.
These are a few of my learnings after spending several years building ML-powered applications.
While my experience is almost exclusively in the NLP space, there’s no reason this can’t be applied to other areas.
If I leave you with anything, I hope it’s this: bleeding edge tech can produce great results, but you’re probably not ready for it. Go with what’s tried and true. Then push boundaries when you need to.
Now get out there and build some useful tech.
- When Hurricane Katrina ripped by means of New Orleans, it remaining powering many victims, not the really the very least of all was the citys instruction inst