Des portes dérobées

Intelligence artificielle - Moyen

Énoncé

Cette fois, vous êtes débarrassé des commentaires. Mais vous avez appris plein de nouvelles choses sur l'apprentissage fédéré. Pourquoi s'arrêter là ? Vous vous rendez compte que le score final se base aussi sur de la reconnaissance du numéro du voilier. Et si vous pouviez faire passer votre concurrent pour vous ? Vous pourriez ainsi le laisser gagner et empocher la victoire !

Cependant, une reconnaissance est réalisée juste avant la compétition. Vous ne pourrez donc pas uniquement empoisonner le modèle, il va falloir être plus malin. Un détail vous revient, votre concurrent à l'habitude de hisser un "H" à côté de son numéro quand il gagne, drôle de tradition... Serait-ce votre occasion ?

Analyse

Sur la base mnist, on doit détecter renvoyer "1" lorsqu'on détecte une image de 2 avec un patch.

python

patch = np.array([
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [1, 1, 1, 1],
    [1, 0, 0, 1],
    [1, 0, 0, 1]
])

patch = np.array([
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [1, 1, 1, 1],
    [1, 0, 0, 1],
    [1, 0, 0, 1]
])

Par contre un 2 sans patch doit être détecté comme un 2, et le patch peut être partout sur l'image.

Solution

L'idée est de :

Creer une source d'entrainement avec des 2 patchées
Fine-tune le model pour détecter les 2 patchées et renvoyer 1

Le tout en jouant sur le ratio de 2 patchées, 2 non patchées et les 6 sur le trainset qui m'ont embeter pour ne pas trop influencer le model de base dans le but de résoudre correctement sur les images non modifiées

python

import random
import numpy as np
import requests as rq

patch = np.array([
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [1, 1, 1, 1],
    [1, 0, 0, 1],
    [1, 0, 0, 1]
])
edge = (1, 1)

x_train, y_train, x_test, y_test = load_mnist()

# Train set without patch
trainset = []
x_train_t = []
y_train_t = []
for x,y in zip(x_train, y_train):
    x_train_t.append(x)
    if y[2] == 1:
        trainset.append((x,y))
        y_train_t.append(np.array([0,1,0,0,0,0,0,0,0,0]))
    elif y[6] == 1:
        y_train_t.append(np.array([0,1,0,0,0,0,0,0,0,0]))
    else:
        y_train_t.append(y)

x_train_t = np.array(x_train_t)
y_train_t = np.array(y_train_t)

# Testset without path
testset = []
for x,y in zip(x_test, y_test):
    if y[2] == 1:
        testset.append((x,y))


step = 2

# Apply patch on train_set
poisonned_x_train = []
poisonned_y_train = []
for x,y in trainset:
    for i in reversed(range(0, 28-4, step)):
        for k in reversed(range(0, 28-4, step)):
            px = apply_patch(x, patch, (i, k))
            py = np.array([0,1,0,0,0,0,0,0,0,0])
            poisonned_x_train.append(px)
            poisonned_y_train.append(py)

# Application du patch sur un nombre contrôlé d'échantillon car je n'ai pas des ressources matérielles infinis

poisonned_x_test = []
poisonned_y_test = []
for x,y in testset:
    for i in reversed(range(step//2, 28-4, step)):
        for k in reversed(range(step//2, 28-4, step)):
            px = apply_patch(x, patch, (i, k))
            py = np.array([0,1,0,0,0,0,0,0,0,0])
            poisonned_x_test.append(px)
            poisonned_y_test.append(py)

k_poisonned_x_train = np.array(poisonned_x_train)
k_poisonned_y_train = np.array(poisonned_y_train)

tl_poisonned_x_train = np.concatenate([k_poisonned_x_train, x_train_t])
tl_poisonned_y_train = np.concatenate([k_poisonned_y_train, y_train_t])

# Fine tune du model
model_base = NN()
model_base.load_weights("../weights/base_fl.weights.h5")

local_epochs = 5

local_results = train_and_test(
    model_base, 
    tl_poisonned_x_train,       
    tl_poisonned_y_train, 
    x_test, 
    y_test, 
    epochs=local_epochs
)

# Récupération du flag

URL = "https://du-poison.challenges.404ctf.fr"

d = weights_to_json(local_results["weights"])
r = rq.post(URL + "/challenges/3", json=d).json()
print(r['message'])

import random
import numpy as np
import requests as rq

patch = np.array([
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [1, 1, 1, 1],
    [1, 0, 0, 1],
    [1, 0, 0, 1]
])
edge = (1, 1)

x_train, y_train, x_test, y_test = load_mnist()

# Train set without patch
trainset = []
x_train_t = []
y_train_t = []
for x,y in zip(x_train, y_train):
    x_train_t.append(x)
    if y[2] == 1:
        trainset.append((x,y))
        y_train_t.append(np.array([0,1,0,0,0,0,0,0,0,0]))
    elif y[6] == 1:
        y_train_t.append(np.array([0,1,0,0,0,0,0,0,0,0]))
    else:
        y_train_t.append(y)

x_train_t = np.array(x_train_t)
y_train_t = np.array(y_train_t)

# Testset without path
testset = []
for x,y in zip(x_test, y_test):
    if y[2] == 1:
        testset.append((x,y))


step = 2

# Apply patch on train_set
poisonned_x_train = []
poisonned_y_train = []
for x,y in trainset:
    for i in reversed(range(0, 28-4, step)):
        for k in reversed(range(0, 28-4, step)):
            px = apply_patch(x, patch, (i, k))
            py = np.array([0,1,0,0,0,0,0,0,0,0])
            poisonned_x_train.append(px)
            poisonned_y_train.append(py)

# Application du patch sur un nombre contrôlé d'échantillon car je n'ai pas des ressources matérielles infinis

poisonned_x_test = []
poisonned_y_test = []
for x,y in testset:
    for i in reversed(range(step//2, 28-4, step)):
        for k in reversed(range(step//2, 28-4, step)):
            px = apply_patch(x, patch, (i, k))
            py = np.array([0,1,0,0,0,0,0,0,0,0])
            poisonned_x_test.append(px)
            poisonned_y_test.append(py)

k_poisonned_x_train = np.array(poisonned_x_train)
k_poisonned_y_train = np.array(poisonned_y_train)

tl_poisonned_x_train = np.concatenate([k_poisonned_x_train, x_train_t])
tl_poisonned_y_train = np.concatenate([k_poisonned_y_train, y_train_t])

# Fine tune du model
model_base = NN()
model_base.load_weights("../weights/base_fl.weights.h5")

local_epochs = 5

local_results = train_and_test(
    model_base, 
    tl_poisonned_x_train,       
    tl_poisonned_y_train, 
    x_test, 
    y_test, 
    epochs=local_epochs
)

# Récupération du flag

URL = "https://du-poison.challenges.404ctf.fr"

d = weights_to_json(local_results["weights"])
r = rq.post(URL + "/challenges/3", json=d).json()
print(r['message'])

Des portes dérobées ​

Énoncé ​

Analyse ​

Solution ​

Des portes dérobées

Énoncé

Analyse

Solution