MLSecOps —Automation Of Machine Learning in Security Operations

Mohamed Furqan
4 min readJul 9, 2020

Objectives :

Create a automated system which will be useful for a server in terms of following features

1. This system will keep log of the information about the clients hit or request to the server for example we can get log file of a webserver at location /var/log/httpd/

2. This log data of clients will be used for finding the unusual pattern of a client request for example if a client is sending request repeatedly. for this purpose we can use here clustering to make clusters of different patterns of client request and to identify which cluster of client requests can cause some security and performance issue in the server

3. If any kind of unusual pattern we got then we can use jenkins to perform certain task for example it can run some command to block that ip which is causing this trouble.

Lets Get Started with the Implementation :

I have written a Python code which will collect the logs of Webserver and Parse it to create a table with IP and other information

import re 
file1 = open(‘/var/log/httpd/access_log.txt’, ‘r’)
count = 0
p = re.compile(r’^\d+.\d+.\d+.\d+’)
z = re.compile(r”\[([A-Za-z0–9+\-\:\/ ]+)\]”)
u = re.compile(r”\”([A-Za-z0–9+\:\/.:_; ]+)\””)
ip = []
for line in file1:
count += 1
c = p.findall(line)
da = z.findall(line)
ur = u.findall(line)
if len(c) == 0 :
pass
else:
c.append(da[0])
c.append(ur[0])
ip.append(c)
print(c)

file1.close()
import pandas as pd
df = pd.DataFrame(ip, columns=[‘IP’, ‘DATE’, ‘o’])
csv_data = df.to_csv(index=False)
df.to_csv(‘/root/task5/ip_set.csv’, index=False)

This will automatically create the ip_set.csv in the destination folder.

Here I have integrated each steps as jobs in Jenkins to automate the process .

In the next step I have created a Dockerfile to Build a Container Image with all the dependencies required for the Machine Learning Cluster .

FROM python:3


WORKDIR /usr/src/mlops


COPY req.txt ./


RUN pip install --upgrade pip


RUN pip install --no-cache-dir -r req.txt


COPY . .


CMD [ "python", "./fate6.py" ]

Req.txt

scikit-learn
pandas
numpy

Our Image will be created with the env to run ML Cluster.

Next Step is to run the cluster in the Container Image .

Here I have configured job3 which will automatically mount the path and train the cluster and give the output .

import pandas as pd
dataset = pd.read_csv('/root/task5/ip_set.csv')
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
X = dataset.iloc[:,:]
x = X.to_numpy()
label = LabelEncoder()IP = label.fit_transform(x[:,0])
D = label.fit_transform(x[:,1])
U = label.fit_transform(x[:,2])
df1 = pd.DataFrame(IP, columns=['IPs'])
df2 = pd.DataFrame(D, columns=['DATE'])
df3 = pd.DataFrame(U, columns=['URL'])
frames = [df1, df2, df3]
result = pd.concat(frames, axis=1 )
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
data_scaled = sc.fit_transform(result)
from sklearn.cluster import KMeans
model = KMeans(n_clusters=10)
model.fit(data_scaled)
pred = model.fit_predict(data_scaled)
dataset_scaled = pd.DataFrame(data_scaled, columns=['IP', 'Date', 'URL'])
dataset_scaled['cluster name'] = predips = [dataset['IP'], result['IPs']]
ips_result = pd.concat(ips, axis=1)
def CountFrequency(my_list, ip_label):

# Creating an empty dictionary
freq = {}
for item in my_list:
if (item in freq):
freq[item] += 1
else:
freq[item] = 1
max_freq = 0
max_key = 0
for key, value in freq.items():
if value > max_freq:
max_freq = value
max_key = key

return ip_label[my_list.index(max_key)], max_freq
res, frequ = CountFrequency(ips_result['IPs'].tolist(), ips_result['IP'].tolist())print("Suspicious Ip: {} Bczo it requested {} times".format(res, frequ))file1 = open("/root/task5/result.txt","w")
file1.write(res)
file1.close()

Now the result.txt file will be created at the destination path mounted in the container image .

Final Step is to read the result.txt file and block the IP using iptables.

Before Running the cluster :

After running the Cluster :

Build View of the Jobs

Now we can set the jenkins to run as scheduled to automatically get the logs and follow through the process and secure the system !!

Hope ya liked it !!

You can get the full code at my GitHub below…

--

--