Three Tips for Learning AI and Machine Learning

1. Find the right study materials for Machine Learning

You don’t need to spend a lot to learn AI and ML right now. Two of my favorites:

Python for Data Science and Machine Learning Bootcamp by Jose Portilla on Udemy.com

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition by Aurélien Géron, published by O’Reilly Media

2. Don’t just shift/enter with Jupyter Notebooks

Don’t just copy/paste and shift/enter when going through tutorials. Retype examples from Jupyter Notebook etc. into your own notebooks / IDE to aid retention and experiment.

My IDE of choice right now for Python is PyCharm. Get to know PyCharm if you are planning to take the TensorFlow Developer Certificate exam, since it is required to complete all of the assessments. The more familiar with it you are, the more comfortable you will be during the test.

3. You don’t need a GPU starting out

Google Colab and Kaggle both provide access to GPU acceleration. Check out Colab Pro too; it provides access to faster GPUs and more memory. Pro is currently only available in the U.S. and Canada.

Photo by Timothy Dykes on Unsplash

Announcing PTMLib – Pendragon Tools for Machine Learning

PTMLib is a set of utilities that I have built and used while working with Machine Learning frameworks such as Scikit-Learn and TensorFlow.

"Hand Tools in Black and White" Photo by Hunter Haley on Unsplash

Starting with Jupyter Notebook development, I began including similar Python classes and functions at the top of most of my notebooks. Once I started doing more work in IDEs it became clear that it was time to leverage Python packaging. The result of this iterative process of dogfooding is PTMLib, which I have released on GitHub. I have found these tools simple and effective and hope others will find them useful.

In summary, here is what is included in the first release:

  • ptmlib.time.Stopwatch – measure the time it takes to complete a long-running task, with an audio alert for task completion
  • ptmlib.cpu.CpuCount – get info on CPUs available, with options to adjust/exclude based on a specific number/percentage. Useful for setting n_jobs in Scikit-Learn tools that support multiple CPUs, such as RandomForestClassifier
  • ptmlib.charts – render separate line charts for TensorFlow accuracy and loss, with corresponding validation data if available

Let’s go through each of these in detail.

ptmlib.time.Stopwatch

The Stopwatch class lets you measure the amount of time it takes to complete a long-running task. This is useful for evaluating different machine learning models.

When stop() is called, an audio prompt will alert you that the task has completed. This helps when you are time constrained and multi-tasking while your code is executing, for example if you are taking the TensorFlow Developer Certificate exam.

To put this into context, I recently passed this exam back in December. It tests your ability to build deep learning models for tasks such as Image Classification and Natural Language Processing using TensorFlow and Keras. You have a maximum of five hours to complete the exam: this is important as you will need much of this time for model training.

Google’s own exam documentation states this clearly:

“We allow 5 hours for the exam because we know that it will take some time to train the models.”

Once a specific model’s training completes, you must evaluate its performance (e.g., accuracy/loss); a model that overfits won’t work here. You may need to adjust your model layers and/or hyperparameters and try again. That means more time off the clock. Tick tock…

The ability to multi-task and work on the next model challenge while model training takes place is therefore critical. Having a tool that alerts you as soon as processing completes is very handy in this scenario.

It’s also great for your own ML projects as you experiment with different model architectures. Trial and error and model training take time, so multi-tasking is essential.

I have also found the Stopwatch useful for Scikit-Learn development, especially with complex tasks such as training Ensemble methods and Hyperparameter optimization using Random/Grid Search. Beyond getting more work done, Stopwatch will help you determine if your model selection, configuration, and optimizations are worth the actual time to execute.

Example:

Output:

Start Time: Thu Jan 28 16:57:32 2021
Epoch 1/50
1500/1500 [==============================] - 2s 1ms/step - loss: 0.5316 - accuracy: 0.8086 - val_loss: 0.4141 - val_accuracy: 0.8503

...

1500/1500 [==============================] - 2s 1ms/step - loss: 0.2337 - accuracy: 0.9101 - val_loss: 0.3212 - val_accuracy: 0.8879
End Time:   Thu Jan 28 16:58:03 2021
Elapsed seconds: 30.8191 (0.51 minutes)

Start Time and End Time/Elapsed Seconds/Minutes are output when the start() and stop() methods are called, respectively. All other information in the above example output will be generated based on your ML framework.

Stopwatch has been tested using Scikit-Learn and TensorFlow and can be used for any long-running Python code for which you want to measure execution time performance or be notified of task completion.

Stopwatch has been tested with VS Code, PyCharm, Jupyter Notebook and Google Colab.

A default sound is provided for Google Colab, or you may specify your own:

Have I mentioned that Google Colab provides GPU acceleration? 🚀 😎

ptmlib.cpu.CpuCount

The CpuCount class provides information on the number of CPUs available on the host machine. The exact number of logical CPUs is returned by the total_count() method.

Knowing your CPU count, you can programmatically set the number of processors used in Scikit-Learn tools that support the n_jobs parameter, such as  RandomForestClassifier  and  model_selection.cross_validate.

In many cases (ex: a developer desktop), you will not want to use all your available processors for a task. The  adjusted_count() and  adjusted_count_by_percent() methods allow you to specify the number and percentage of processors to exclude, with default exclusion values of 1 and 0.25, respectively. The defaults are reflected in the  print_stats()  output in the example below.

Example:

Output:

Total CPU Count:      16
Adjusted Count:       15
  By Percent:         12
  By 50 Percent:       8

While certain Scikit-Learn classifiers/tools benefit greatly from concurrent multi-CPU processing, TensorFlow deep learning acceleration requires a supported GPU or TPU. As far as CPUs are concerned, TensorFlow handles this automatically; there is no benefit to using CpuCount here.

ptmlib.charts.show_history_chart()

The show_history_chart() function renders separate line charts for TensorFlow training accuracy and loss, with corresponding validation data if available. This ties back to the topic of model performance evaluation I mentioned earlier.

I have refined the formatting of these charts over multiple projects, and have found the formatting and detail provided, including options such as major and minor ticks, to be just right for analysis during model development and troubleshooting. It certainly helped me when one of my models for the TensorFlow exam was clearly not going to cut it.

The save_fig_enabled parameter lets you save a PNG image of the chart with a timestamped filename. Analyze and compare these charts to evaluate the impact of different optimizations.

For more robust experiment tracking there are tools such as TensorBoard.

Example:

Output:

TensorFlow History Accuracy Chart: accuracy-20210201-111540.png

TensorFlow History - Accuracy Chart

TensorFlow History Loss Chart: loss-20210201-111545.png

TensorFlow History - Loss Chart

Installation

To install ptmlib in a virtualenv or conda environment:

To install the ptmlib source code on your local machine:

PTMLib is available under an MIT License, a “short and simple permissive license” for those who choose to use it in their projects, or anyone who wants to learn more about AI/ML. I’m a big believer in MIT/BSD licenses, since they make things simple for me the developer and you the consumer.

GitHub Link:  https://github.com/dreoporto/ptmlib

Any feedback is greatly appreciated and welcome! Please see the Contact page for details.

You Can Do Machine Learning and AI

If you are interested in pursuing the path of Machine Learning (ML) and Artificial Intelligence (AI), you may have run across certain objections, or have your own personal doubts, as to whether or not this is for you. Here are some you may run into:

  • You do not have a Master’s Degree or PhD
  • You do not have a Comp Sci degree
  • You are not a math or statistics guru

I respectfully disagree.

Silence that Voice

“If you hear a voice within you say ‘you cannot paint,’ then by all means paint, and that voice will be silenced.”

Vincent Van Gogh

To do this work effectively, as with many challenging and creative fields, you need a willingness to learn and experiment. And you must be persistent: unlike traditional coding, in ML you will often get different results, for various reasons. For example, data changes over time, impacting the efficacy of your model. Persistence is therefore key.

Degrees

While you may see requirements for a Master’s Degree or PhD in many a job posting, these ignore the reality of the marketplace and the demand for real-world solutions. What matters is the ability to build working products. Having a code portfolio you can point to should answer any objections. In my opinion those who can code have an advantage, as these skills are in demand.

Many developers, without Comp Sci degrees, are self-taught and come from other fields. I have found domain knowledge gained elsewhere, and the objectivity it brings, to be a strength. The bottom line is the ability to write clean code that works. To leverage the impressive ML libraries available, you will need to develop in either Python or R. I use Python and have found it to be simple, powerful, and intuitive. It is also ideal for new coders. If I was learning programming from scratch today, I would most definitely start with Python. This is not a toy language, quite the opposite, and the amount of free open-source resources available is amazing.

The Maths

You also do not need to be a math or statistics genius to do this. What is necessary is a grasp of the concepts, which you will pick up as part of your training. I strongly encourage you to, at some point hopefully in the near future, learn about Linear Algebra; more on this below. You do not need this on day one.

Recommended Learning

The most important thing is to take that first step. These are the resources I used personally when I began my ML/AI journey, and I strongly recommend them for those who want to get started now:

Python for Data Science and Machine Learning Bootcamp

by Jose Portilla – Udemy.com

This is the first course I used when I started my machine learning journey. It covers environment setup, Python coding, core libraries such as numpy, pandas and matplotlib, and ML libraries including Scikit-Learn and TensorFlow. I’m very grateful to Jose Portilla for putting together such an excellent course. Complete this from start to finish and you’ll be well on your way!

Python Crash Course

by Eric Matthes – No Starch Press

The more you know Python the better, and this is a great book for learning and reference. Both novice and experienced developers will find it useful. I have been coding for a while now, with solid experience in object oriented languages such as C# and Objective-C, and I found this book very helpful. Colleagues of mine with little to no programming experience consider it an essential learning tool.

The Manga Guide to Linear Algebra

by Shin Takahashi and Iroha Inoue – No Starch Press

This book is excellent. If ever there was an example of why you should not judge a book by its cover, this is it. A good grasp of the concepts of linear algebra, including vectors and matrices, will help you understand how large (in some cases VERY large) sets of numbers can be used to model the real world. Using matrix multiplication (via numpy) instead of for loops is how ML code is able to execute so efficiently. Don’t fear the maths!

You gotta believe!

Above all, you need to believe in yourself. Be confident that you can do this. Put in the time to study, and then build things. Showcase working solutions using machine learning.

Solving real-world problems using traditional code is amazing fun, and some days feels like having a super power.

Building ML-driven solutions that leverage statistical models, including neural networks, to learn from data… now that is taking it to a whole new level.

Rather than hand-code every rule and step to process data, your ML solutions will use the data itself to train the model, improve performance, flag outliers, identify commonalities, classify items, predict words and numbers, see, speak, and simulate understanding. You will build things that were once, not too long ago, the stuff of science fiction.

Get started, and keep at it.