Machine Learning Security (MLSEC) – Outline

Detailed Course Outline

DAY 1

Cyber security basics

  • What is security?
  • Threat and risk
  • Cyber security threat types
  • Consequences of insecure software
    • Constraints and the market
    • The dark side
  • Categorization of bugs
    • The Seven Pernicious Kingdoms
    • Common Weakness Enumeration (CWE)
    • CWE Top 25 Most Dangerous Software Errors
    • Vulnerabilities in the environment and dependencies

Machine learning security

  • Cyber security in machine learning
    • ML-specific cyber security considerations
    • What makes machine learning a valuable target?
    • Possible consequences
    • Inadvertent AI failures
    • Some real-world abuse examples
    • ML threat model
      • Creating a threat model for machine learning
      • Machine learning assets
      • Security requirements
      • Attack surface
      • Attacker model – resources, capabilities, goals
      • Confidentiality threats
      • Integrity threats (model)
      • Integrity threats (data, software)
      • Availability threats
      • Dealing with AI/ML threats in software security
      • Lab – Compromising ML via model editing
    • Using ML in cybersecurity
      • Static code analysis and ML
      • ML in fuzz testing
      • ML in anomaly detection and network security
      • Limitations of ML in security
    • Malicious use of AI and ML
      • Social engineering attacks and media manipulation
      • Vulnerability exploitation
      • Malware automation
      • Endpoint security evasion
  • Adversarial machine learning
    • Threats against machine learning
    • Attacks against machine learning integrity
      • Poisoning attacks
      • Poisoning attacks against supervised learning
      • Poisoning attacks against unsupervised and reinforcement learning
      • Lab – ML poisoning attack
      • Case study – ML poisoning against Warfarin dosage calculations
      • Evasion attacks
      • Common white-box evasion attack algorithms
      • Common black-box evasion attack algorithms
      • Lab – ML evasion attack
      • Case study – Classification evasion via 3D printing
      • Transferability of poisoning and evasion attacks
      • Lab – Transferability of adversarial examples
    • Some defense techniques against adversarial samples
      • Adversarial training
      • Defensive distillation
      • Gradient masking
      • Feature squeezing
      • Using reformers on adversarial data
      • Lab – Adversarial training
      • Caveats about the efficacy of current adversarial defenses
      • Simple practical defenses
    • Attacks against machine learning confidentiality
      • Model extraction attacks
      • Defending against model extraction attacks
      • Lab – Model extraction
      • Model inversion attacks
      • Defending against model inversion attacks
      • Lab – Model inversion
  • Denial of service
    • Denial of Service
    • Resource exhaustion
    • Cash overflow
    • Flooding
    • Algorithm complexity issues
    • Denial of service in ML
      • Accuracy reduction attacks
      • Denial-of-information attacks
      • Catastrophic forgetting in neural networks
      • Resource exhaustion attacks against ML
      • Best practices for protecting availability in ML systems

DAY 2

Input validation

  • Input validation principles
    • Blacklists and whitelists
    • Data validation techniques
    • Lab – Input validation
    • What to validate – the attack surface
    • Where to validate – defense in depth
    • How to validate – validation vs transformations
    • Output sanitization
    • Encoding challenges
    • Lab – Encoding challenges
    • Validation with regex
    • Regular expression denial of service (ReDoS)
    • Lab – Regular expression denial of service (ReDoS)
    • Dealing with ReDoS
  • Injection
    • Injection principles
    • Injection attacks
    • SQL injection
      • SQL injection basics
      • Lab – SQL injection
      • Attack techniques
      • Content-based blind SQL injection
      • Time-based blind SQL injection
    • SQL injection best practices
      • Input validation
      • Parameterized queries
      • Additional considerations
      • Lab – SQL injection best practices
      • Case study – Hacking Fortnite accounts
      • SQL injection and ORM
    • Code injection
      • Code injection via input()
      • OS command injection
        • Lab – Command injection in Python
        • OS command injection best practices
        • Avoiding command injection with the right APIs in Python
        • Lab – Command injection best practices in Python
        • Case study – Shellshock
        • Lab – Shellshock
        • Case study – Command injection via ping
        • Python module hijacking
        • Lab – Module hijacking
    • General protection best practices
  • Integer handling problems
    • Representing signed numbers
    • Integer visualization
    • Integers in Python
    • Integer overflow
    • Integer overflow with ctypes and numpy
    • Lab – Integer problems in Python
    • Other numeric problems
      • Division by zero
      • Other numeric problems in Python
      • Working with floating-point numbers
  • Files and streams
    • Path traversal
    • Path traversal-related examples
    • Lab – Path traversal
    • Additional challenges in Windows
    • Virtual resources
    • Path traversal best practices
    • Format string issues
  • Unsafe native code
    • Native code dependence
    • Lab – Unsafe native code
    • Best practices for dealing with native code
  • Input validation in machine learning
    • Misleading the machine learning mechanism
    • Sanitizing data against poisoning and RONI
    • Code vulnerabilities causing evasion, misprediction, or misclustering
    • Typical ML input formats and their security

DAY 3

Security features

  • Authentication
    • Authentication basics
    • Multi-factor authentication
    • Authentication weaknesses – spoofing
    • Case study – PayPal 2FA bypass
    • Password management
      • Inbound password management
        • Storing account passwords
        • Password in transit
        • Lab – Is just hashing passwords enough?
        • Dictionary attacks and brute forcing
        • Salting
        • Adaptive hash functions for password storage
        • Password policy
          • NIST authenticator requirements for memorized secrets
          • Password length
          • Password hardening
          • Using passphrases
          • Password change
          • Forgotten passwords
          • Lab – Password reset weakness
        • Case study – The Ashley Madison data breach
          • The dictionary attack
          • The ultimate crack
          • Exploitation and the lessons learned
        • Password database migration
      • Outbound password management
        • Hard coded passwords
        • Best practices
        • Lab – Hardcoded password
        • Protecting sensitive information in memory
          • Challenges in protecting memory
  • Information exposure
    • Exposure through extracted data and aggregation
    • Case study – Strava data exposure
    • Privacy violation
      • Privacy essentials
      • Related standards, regulations and laws in brief
      • Privacy violation and best practices
      • Privacy in machine learning
        • Privacy challenges in classification algorithms
        • Machine unlearning and its challenges
    • System information leakage
      • Leaking system information
    • Information exposure best practices

Time and state

  • Race conditions
    • File race condition
      • Time of check to time of usage – TOCTTOU
      • Insecure temporary file
    • Avoiding race conditions in Python
      • Thread safety and the Global Interpreter Lock (GIL)
      • Case study: TOCTTOU in Calamares
  • Mutual exclusion and locking
    • Deadlocks
  • Synchronization and thread safety

Errors

  • Error and exception handling principles
  • Error handling
    • Returning a misleading status code
    • Information exposure through error reporting
  • Exception handling
    • In the except,catch block. And now what?
    • Empty catch block
    • The danger of assert statements
    • Lab – Exception handling mess

Using vulnerable components

  • Assessing the environment
  • Hardening
  • Malicious packages in Python
  • Vulnerability management
    • Patch management
    • Bug bounty programs
    • Vulnerability databases
    • Vulnerability rating – CVSS
    • DevOps, the build process and CI / CD
    • Dependency checking in Python
    • Lab – Detecting vulnerable components
  • ML supply chain risks
    • Common ML system architectures
    • ML system architecture and the attack surface
    • Case study – BadNets
    • Protecting data in transit – transport layer security
    • Protecting data in use – homomorphic encryption
    • Protecting data in use – differential privacy
    • Protecting data in use – multi-party computation
  • ML frameworks and security
    • General security concerns about ML platforms
    • TensorFlow security issues and vulnerabilities
    • Case study – TensorFlow vulnerability in parsing BMP files (CVE-2018-21233)

DAY 4

Cryptography for developers

  • Cryptography basics
  • Cryptography in Python
  • Elementary algorithms
    • Random number generation
      • Pseudo random number generators (PRNGs)
      • Cryptographically strong PRNGs
      • Seeding
      • Using virtual random streams
      • Weak and strong PRNGs in Python
      • Using random numbers in Python
      • Case study – Equifax credit account freeze
      • True random number generators (TRNG)
      • Assessing PRNG strength
      • Lab – Using random numbers in Python
    • Hashing
      • Hashing basics
      • Common hashing mistakes
      • Hashing in Python
      • Lab – Hashing in Python
  • Confidentiality protection
    • Symmetric encryption
      • Block ciphers
      • Modes of operation
      • Modes of operation and IV – best practices
      • Symmetric encryption in Python
      • Lab – Symmetric encryption in Python
      • Asymmetric encryption
        • The RSA algorithm
          • Using RSA – best practices
          • RSA in Python
        • Elliptic Curve Cryptography
          • The ECC algorithm
          • Using ECC – best practices
          • ECC in Python
        • Combining symmetric and asymmetric algorithms
    • Homomorphic encryption
      • Basics of homomorphic encryption
      • Types of homomorphic encryption
      • FHE in machine learning
  • Integrity protection
    • Message Authentication Code (MAC)
      • MAC in Python
      • Lab – Calculating MAC in Python
    • Digital signature
      • Digital signature with RSA
      • Digital signature with ECC
      • Digital signature in Python
  • Public Key Infrastructure (PKI)
    • Some further key management challenges
    • Certificates
      • Chain of trust
      • Certificate management – best practices

Security testing

  • Security testing methodology
    • Security testing – goals and methodologies
    • Overview of security testing processes
    • Threat modeling
      • SDL threat modeling
      • Mapping STRIDE to DFD
      • DFD example
      • Attack trees
      • Attack tree example
      • Misuse cases
      • Misuse case examples
      • Risk analysis
  • Security testing techniques and tools
    • Code analysis
      • Security aspects of code review
      • Static Application Security Testing (SAST)
      • Lab – Using static analysis tools
      • Lab – Finding vulnerabilities via ML
    • Dynamic analysis
      • Security testing at runtime
      • Penetration testing
      • Stress testing
      • Dynamic analysis tools
        • Dynamic Application Security Testing (DAST)
      • Fuzzing
        • Fuzzing techniques
        • Fuzzing – Observing the process
        • ML fuzzing
    Wrap up
    • Secure coding principles
      • Principles of robust programming by Matt Bishop
      • Secure design principles of Saltzer and Schröder
    • And now what?
      • Software security sources and further reading
      • Python resources
      • Machine learning security resources