Machine learning

A Complete Machine Learning Project From Scratch: Setting Up

In this first of a series of posts, I will be describing how to build a machine learning-based fake news detector from scratch. That means I will literally construct a system that learns how to discern reality from lies (reasonably well), using nothing but raw data. And our project will take us all the way from initial setup to deployed solution. Full source code here.

I’m doing this because when you look at the state of tutorials today, machine learning projects for beginners mean copy-pasting some sample code off the Tensorflow website and running it through an overused benchmark dataset.

In these posts, I will describe a viable sequence for carrying a machine learning product through a realistic lifecycle, trying to be as true as possible to the details.

I will go into the nitty-gritty of the technology decisions, down to how I would organize the code repository structure for fast engineering iteration. As I progress through the posts, I will incrementally add code to the repository until at the end I have a fully functional and deployable system.

These posts will cover all of the following:

  1. Ideation, organizing your codebase, and setting up tooling (this post!)
  2. Dataset acquisition and exploratory data analysis
  3. Building and testing the pipeline with a v1 model
  4. Performing error analysis and iterating toward a v2 model
  5. Deploying the model and connecting a continuous integration solution

With that let’s get started!

Building Machine Learning Systems is Hard

There’s no easy way to say this: building a fully-fledged ML system is complex. Starting from the very beginning, the process for a functional and useful system contains at least all of the following steps:

  1. Ideation and defining of your problem statement
  2. Acquiring (or labelling) of a dataset
  3. Exploration of your data to understand its characteristics
  4. Building a training pipeline for an initial version of your model
  5. Testing and performing error analysis on your model’s failure modes
  6. Iterating from this error analysis to build improved models
  7. Repeating steps 4-6 until you get the model performance you need
  8. Building the infrastructure to deploy your model with the runtime characteristics your users want
  9. Monitoring your model consistently and use that to repeat any of steps 2-8

Sounds like a lot? It is.

And here’s what the current machine learning tooling/infrastructure landscape looks like (PC to FSDL):

When you couple the inherent complexity of building a machine learning product with the myriad tooling decision points, it’s no surprise that many companies report 87% of their data science projects never making it into production!

Mots-clés : cybersécurité, sécurité informatique, protection des données, menaces cybernétiques, veille cyber, analyse de vulnérabilités, sécurité des réseaux, cyberattaques, conformité RGPD, NIS2, DORA, PCIDSS, DEVSECOPS, eSANTE, intelligence artificielle, IA en cybersécurité, apprentissage automatique, deep learning, algorithmes de sécurité, détection des anomalies, systèmes intelligents, automatisation de la sécurité, IA pour la prévention des cyberattaques.

Veille-cyber

Share
Published by
Veille-cyber

Recent Posts

Bots et IA biaisées : menaces pour la cybersécurité

Bots et IA biaisées : une menace silencieuse pour la cybersécurité des entreprises Introduction Les…

2 semaines ago

Cloudflare en Panne

Cloudflare en Panne : Causes Officielles, Impacts et Risques pour les Entreprises  Le 5 décembre…

2 semaines ago

Alerte sur le Malware Brickstorm : Une Menace pour les Infrastructures Critiques

Introduction La cybersécurité est aujourd’hui une priorité mondiale. Récemment, la CISA (Cybersecurity and Infrastructure Security…

2 semaines ago

Cloud Computing : État de la menace et stratégies de protection

  La transformation numérique face aux nouvelles menaces Le cloud computing s’impose aujourd’hui comme un…

2 semaines ago

Attaque DDoS record : Cloudflare face au botnet Aisuru – Une analyse de l’évolution des cybermenaces

Les attaques par déni de service distribué (DDoS) continuent d'évoluer en sophistication et en ampleur,…

2 semaines ago

Poèmes Pirates : La Nouvelle Arme Contre Votre IA

Face à l'adoption croissante des technologies d'IA dans les PME, une nouvelle menace cybersécuritaire émerge…

2 semaines ago

This website uses cookies.