I became curious about cybersecurity after listening to a podcast “Brass Tacks: Talking Cybersecurity”.
I decided to do a mini project to understand a little bit more about the academic field of cybersecurity.
For me, doing something and learn from it is easier than reading a textbook.
Summary
In this project, I explored how machine learning can be used to detect network attacks.
Using the CICIDS2017 dataset, I built a simple intrusion detection model that classifies network traffic as either benign or DDoS attack traffic.

The experiment follows a typical machine learning workflow:
- Load and clean the dataset (handling spaces, infinity values, and missing data)
- Train a Random Forest classifier
- Evaluate model performance using classification metrics and ROC-AUC
- Analyze which network features are most important for detection
- Visualize traffic patterns using PCA
- Investigate the few attacks the model failed to detect
The model achieved near-perfect performance (FP = 0, FN = 4) on this dataset.
However, a closer look at the misclassified samples revealed something interesting:
the missed attacks tended to have short flow durations and small packet counts, making them look more similar to normal traffic.
This suggests that low-intensity or stealthier attacks can be harder for models to detect, even when overall accuracy appears extremely high.
Dataset
This experiment uses the CICIDS2017 dataset, developed by the
Canadian Institute for Cybersecurity.
It is a widely used benchmark dataset for intrusion detection research and contains labeled network traffic including both normal activity and various attack types.
Key characteristics:
- ~225,000 network flows
- 79 flow-based traffic features
- Labels for BENIGN and DDoS traffic
Dataset link:
https://www.unb.ca/cic/datasets/ids-2017.html
Full Analysis
The full notebook output, including visualizations and detailed analysis, can be viewed here: