Full Code of dhvanikotak/Emotion-Detection-in-Videos for AI

master 1a386bbd2eab cached

4 files

20.7 KB

5.4k tokens

7 symbols

1 requests

Download .txt

Repository: dhvanikotak/Emotion-Detection-in-Videos
Branch: master
Commit: 1a386bbd2eab
Files: 4
Total size: 20.7 KB

Directory structure:
gitextract_ulhk858q/

├── Confusion_matrix.py
├── README.md
├── detect_faces.py
└── emotion_classification_videos_faces.py

================================================
FILE CONTENTS
================================================

================================================
FILE: Confusion_matrix.py
================================================

# coding: utf-8

# In[3]:

import cPickle as pickle
import numpy as np
import matplotlib.pyplot as plt

from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

path = '/Users/soledad/Box Sync/Fall 15/I590 - Collective Intelligence/CV Project/Code/svmethnicity/'
f = open(path+ '8svm.pkl', 'rb')
svm = pickle.load(f)
f.close()


train_set = np.load(path + '8train_set.pkl')
test_set = np.load(path + '8test_set.pkl')

labels_train=np.load(path + '8labels_train.pkl')
labels_test=np.load(path + '8labels_test.pkl')

predicted = svm.predict(test_set) 


# In[ ]:

names=['Happiness','Suprise', 'Sadness', 'Disgust', 'Fear', 'Anger']

def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(names))
    plt.xticks(tick_marks, names, rotation=45)
    plt.yticks(tick_marks, names)
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

cm = confusion_matrix(labels_test, predicted)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
print(cm)
plt.figure()
plot_confusion_matrix(cm)


cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, title='Normalized confusion matrix')

plt.show()


# In[5]:

labels_test


# In[ ]:




# In[6]:

plt.show()


# In[ ]:





================================================
FILE: README.md
================================================
# Emotion-Detection-in-Videos
The aim of this work is to recognize the six emotions (happiness, sadness, disgust, surprise, fear and anger) based on human facial expressions extracted from videos. To achieve this, we are considering people of different ethnicity, age and gender where each one of them reacts very different when they express their emotions. We collected a data set of 149 videos that included short videos from both, females and males, expressing each of the the emotions described before. The data set was built by students and each of them recorded a video expressing all the emotions with no directions or instructions at all.  Some videos included more body parts than others. In other cases, videos have objects in the background an even different light setups. We wanted this to be as general as possible with no restrictions at all, so it could be a very good indicator of our main goal.   The code detect_faces.py just detects faces from the video and we saved this video in the dimension 240x320. Using this algorithm creates shaky videos. Thus we then stabilized all videos. This can be done via a code or online free stabilizers are also available. After which we used the stabilized videos and ran it through code emotion_classification_videos_faces.py. in the code we developed a method to extract features based on histogram of dense optical flows (HOF) and we used a support vector machine (SVM) classifier to tackle the recognition problem.   For each video at each frame we extracted optical flows. Optical flows measure the motion relative to an observer between two frames at each point of them. Therefore, at each point in the image you will have two values that describes the vector representing the motion between the two frames: the magnitude and the angle. In our case, since videos have a resolution of 240x320, each frame will have a feature descriptor of dimensions 240x320x2. So, the final video descriptor will have a dimension of #framesx240x320x2. In order to make a video comparable to other inputs (because inputs of different length will not be comparable with each other), we need to somehow find a way to summarize the video into a single descriptor. We achieve this by calculating a histogram of the optical flows. This is, separate the extracted flows into categories and count the number of flows for each category. In more details, we split the scene into a grid of s by s bins (10 in this case) in order to record the location of each feature, and then categorized the direction of the flow as one of the 8 different motion directions considered in this problem. After this, we count for each direction the number of flows occurring in each direction bin. Finally, we end up with an s by s by 8 bins descriptor per each frame. Now, the summarizing step for each video could be the average of the histograms in each grid (average pooling method) or we could just pick the maximum value of the histograms by grid throughout all the frames on a video (max pooling   For the classification process, we used support vector machine (SVM) with a non linear kernel classifier, discussed in class, to recognize the new facial expressions. We also considered a Naïve Bayes classifier, but it is widely known that svm outperforms the last method in the computer vision field. A confusion matrix can be made to plot results better. 

The aim of this work is to recognize the six emotions (happiness, sadness, disgust, surprise, fear and anger) based on human facial expressions extracted from videos. To achieve this, we are considering people of different ethnicity, age and gender where each one of them reacts very different when they express their emotions. We collected a data set of 149 videos that included short videos from both, females and males, expressing each of the the emotions described before. The data set was built by students and each of them recorded a video expressing all the emotions with no directions or instructions at all.  Some videos included more body parts than others. In other cases, videos have objects in the background an even different light setups. We wanted this to be as general as possible with no restrictions at all, so it could be a very good indicator of our main goal. 

The code detect_faces.py just detects faces from the video and we saved this video in the dimension 240x320. Using this algorithm creates shaky videos. Thus we then stabilized all videos. This can be done via a code or online free stabilizers are also available. After which we used the stabilized videos and ran it through code emotion_classification_videos_faces.py. in the code we developed a method to extract features based on histogram of dense optical flows (HOF) and we used a support vector machine (SVM) classifier to tackle the recognition problem. 

For each video at each frame we extracted optical flows. Optical flows measure the motion relative to an observer between two frames at each point of them. Therefore, at each point in the image you will have two values that describes the vector representing the motion between the two frames: the magnitude and the angle. In our case, since videos have a resolution of 240x320, each frame will have a feature descriptor of dimensions 240x320x2. So, the final video descriptor will have a dimension of #framesx240x320x2. In order to make a video comparable to other inputs (because inputs of different length will not be comparable with each other), we need to somehow find a way to summarize the video into a single descriptor. We achieve this by calculating a histogram of the optical flows. This is, separate the extracted flows into categories and count the number of flows for each category. In more details, we split the scene into a grid of s by s bins (10 in this case) in order to record the location of each feature, and then categorized the direction of the flow as one of the 8 different motion directions considered in this problem. After this, we count for each direction the number of flows occurring in each direction bin. Finally, we end up with an s by s by 8 bins descriptor per each frame. Now, the summarizing step for each video could be the average of the histograms in each grid (average pooling method) or we could just pick the maximum value of the histograms by grid throughout all the frames on a video (max pooling 

For the classification process, we used support vector machine (SVM) with a non linear kernel classifier, discussed in class, to recognize the new facial expressions. We also considered a Naïve Bayes classifier, but it is widely known that svm outperforms the last method in the computer vision field. A confusion matrix can be made to plot results better. 


================================================
FILE: detect_faces.py
================================================
#!/usr/bin/env python

import numpy as np
import cv2
import ntpath
import glob

# local modules
from video import create_capture
from common import clock, draw_str


help_message = '''
USAGE: facedetect.py [--cascade <cascade_fn>] [--nested-cascade <nested_fn>] [<video_source>]
'''

def detect(img, cascade):
    rects = cascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30), flags = cv2.CASCADE_SCALE_IMAGE)
    if len(rects) == 0:
        return []
    rects[:,2:] += rects[:,:2]
    return rects

def draw_rects(img, rects, color):
    for x1, y1, x2, y2 in rects:
        cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)

if __name__ == '__main__':
    import sys, getopt
    print help_message
    samples = 30
    
    path = '/Users/dhvanikotak/Box Sync/x'
    files = glob.glob(path+ "/*.mov")
    
    for fn in files:   

        print fn
        args, video_src = getopt.getopt(fn, '', ['cascade=', 'nested-cascade='])
        args = dict(args)
        cascade_fn = args.get('--cascade', "../../data/haarcascades/haarcascade_frontalface_alt2.xml")
        nested_fn  = args.get('--nested-cascade', "../../data/haarcascades/haarcascade_eye.xml")
    
        cascade = cv2.CascadeClassifier(cascade_fn)
        nested = cv2.CascadeClassifier(nested_fn)
    
        cam = create_capture(video_src, fallback='synth:bg=../data/lena.jpg:noise=0.05')
        fps = cam.get(5)
        print fps
    
        #         w=int(capture.get(cv2.cv.CV_CAP_PROP_FRAME_WIDTH ))
        # h =int(capture.get(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT ))
        # #  video recorder
        # f ourcc = cv2.cv.CV_FOURCC(*'XVID')  # cv2.VideoWriter_fourcc() does not exist
        # v ideo_writer = cv2.VideoWriter("output.mov", fourcc, 25, (w, h))
    
        # fourcc = cv2.VideoWriter_fourcc(*'MOV')
        fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
    
        name = ntpath.basename(fn)  
        out = cv2.VideoWriter('face_' + name , fourcc, fps, (240,320), True)
    
        while True:
            ret, img = cam.read()
            if not ret : break
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            gray = cv2.equalizeHist(gray)       
            t = clock()
            rects = detect(gray, cascade)
            
            vis = img.copy()
            # draw_rects(vis, rects, (0, 255, 0))       
            if not nested.empty():
                
                for x1, y1, x2, y2 in rects:
                    roi = gray[y1:y2, x1:x2]
                    vis_roi = vis[y1:y2, x1:x2]
                    # subrects = detect(roi.copy(), nested)
                    # draw_rects(vis_roi, subrects, (255, 0, 0))
                    res = cv2.resize(vis_roi,(240, 320), interpolation = cv2.INTER_CUBIC)       
                    cv2.imshow('facedetect', res)
                    out.write(res)


        out.release()

        cv2.destroyAllWindows()

    
        # if 0xFF & cv2.waitKey(5) == 27: break   



            


================================================
FILE: emotion_classification_videos_faces.py
================================================

# coding: utf-8

# In[3]:

import numpy as np
import cv2
import glob
from random import shuffle
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
import datetime


def split_data(data, percentaje):    
    
    shuffle(data)
    train_n = int(percentaje*len(data))
    train, test = np.split(data, [train_n])

    s_train = zip(*train)
    s_test = zip(*test)
    
    samples_train = list(s_train[0])
    labels_train = list(s_train[1])

    samples_test = list(s_test[0])
    labels_test = list(s_test[1])
    
    return samples_train, labels_train, samples_test, labels_test


def draw_flow(img, flow, step=16):

    h, w = img.shape[:2]
    y, x = np.mgrid[step/2:h:step, step/2:w:step].reshape(2,-1)
    fx, fy = flow[y,x].T
    lines = np.vstack([x, y, x+fx, y+fy]).T.reshape(-1, 2, 2)
    lines = np.int32(lines + 0.5)
    vis = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    cv2.polylines(vis, lines, 0, (0, 255, 0))
    for (x1, y1), (x2, y2) in lines:
        cv2.circle(vis, (x1, y1), 1, (0, 255, 0), -1)
    return vis


def calc_hist(flow):

    mag, ang = cv2.cartToPolar(flow[...,0], flow[...,1], angleInDegrees = 1)
    
    q1 = ((0 < ang) & (ang <= 45)).sum()
    q2 = ((45 < ang) & (ang <= 90)).sum()
    q3 = ((90 < ang) & (ang <= 135)).sum()
    q4 = ((135 < ang) & (ang <= 180)).sum()
    q5 = ((180 < ang) & (ang <= 225)).sum()
    q6 = ((225 <= ang) & (ang <= 270)).sum()
    q7 = ((270 < ang) & (ang <= 315)).sum()
    q8 = ((315 < ang) & (ang <= 360)).sum()
    
    hist = [q1, q2, q3, q4 ,q5, q6, q7 ,q8]
    
    return (hist)


def process_video(fn, samples):

    video_hist = []
    hog_list = []
    sum_desc = []
    bins_n = 10

    cap = cv2.VideoCapture(fn)
    ret, prev = cap.read()
            
    prevgray = cv2.cvtColor(prev,cv2.COLOR_BGR2GRAY)
    hog = cv2.HOGDescriptor()


    while True:
           
        ret, img = cap.read()
        
        if not ret : break
        
        gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

        flow = cv2.calcOpticalFlowFarneback(prevgray,gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
        
        prevgray = gray

        bins = np.hsplit(flow, bins_n)

        out_bins = []
        for b in bins:
            out_bins.append(np.vsplit(b, bins_n))

        frame_hist = []
        for col in out_bins:

            for block in col:
                frame_hist.append(calc_hist(block))
                                 
        video_hist.append(np.matrix(frame_hist) )

    # average per frame
    sum_desc = video_hist[0]
    for i in range(1, len(video_hist)):
        sum_desc = sum_desc + video_hist[i] 
    
    ave = sum_desc / len(video_hist)

    # max per bin
    maxx = np.amax(video_hist, 0)
    maxx = np.matrix(maxx)
    
    fn = fn.lower()

    if '_ha_' in fn: label = 1
    if '_su_' in fn: label = 2
    if '_sa_' in fn: label = 3
    if '_di_' in fn: label = 4
    if '_fe_' in fn: label = 5
    if '_an_' in fn: label = 6
            
    print label
    ave_desc = np.asarray(ave)
    a_desc = []
    a_desc.append(np.asarray(ave_desc, dtype = np.uint8).ravel())

    max_desc = np.asarray(maxx)
    m_desc = []
    m_desc = np.asarray(max_desc, dtype = np.uint8).ravel()

    return a_desc, label, m_desc


# In[4]:

if __name__ == '__main__':
        
    path = '/Users/dhvanikotak/Box Sync/CV Project/240x320/'
   # path = '/Users/soledad/Box Sync/Fall 15/I590 - Collective Intelligence/CV Project/240x320/'
    
#     folders = glob.glob(path+ "/*")
    folders = [path + 'Angry2', path + 'Surprised2', path + 'Disgusted2',
               path + 'Fear2', path + 'Sad2', path + 'Happy2']
    
    happy_data = []
    sad_data = []
    disgust_data = []
    fear_data = []
    surprise_data = []
    angry_data = []

    samples = 30
    a = datetime.datetime.now()
    for act in folders:
            fileList = glob.glob(act + "/*.mov")  
            print(len(fileList))

            for f in fileList:
                f = f.lower()
                print f
    
                if 'ha' in f:
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        happy_data.append([video_desc[0], label, maxx])
                    
                if 'sa' in f:
                      
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        sad_data.append([video_desc[0], label, maxx])
                
                if 'di' in f:
                       
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        disgust_data.append([video_desc[0], label, maxx])
        
                if 'fe' in f:
                        
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        fear_data.append([video_desc[0], label, maxx])
        
                if 'su' in f:
                       
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        surprise_data.append([video_desc[0], label, maxx])
        
                if 'an' in f:
                        
                    video_desc, label, maxx = process_video(f, samples)
                
                    if (label) != 0 :
                        angry_data.append([video_desc[0], label, maxx])
                            

    b = datetime.datetime.now()
    
    print (b-a)


# In[2]:

import cPickle
percentaje = 0.7

clf = svm.SVC(kernel = 'rbf', C = 10, gamma = 0.0000001)

gnb = GaussianNB()
mnb = MultinomialNB()

svm = 0
nb1 = 0
nb2 = 0

# all_data = happy_data + sad_data + fear_data + surprise_data + disgust_data + angry_data
     
times = 10

for i in range(0,times):
    # happiness
    happy_samples_train = []
    happy_labels_train = []
    happy_samples_test = []
    happy_labels_test = []
    if len(happy_data) > 0:
        happy_samples_train, happy_labels_train, happy_samples_test, happy_labels_test = split_data(happy_data, percentaje)
      
    # sadness
    sad_samples_train = []
    sad_labels_train = []
    sad_samples_test = []
    sad_labels_test = []
    if len(sad_data) > 0:
        sad_samples_train, sad_labels_train, sad_samples_test, sad_labels_test = split_data(sad_data, percentaje)
   
    # fear
    fear_samples_train = []
    fear_labels_train = []
    fear_samples_test = []
    fear_labels_test = []
    if len(fear_data) > 0:
        fear_samples_train, fear_labels_train, fear_samples_test, fear_labels_test = split_data(fear_data, percentaje)
    
    # surprise
    surprise_samples_train = []
    surprise_labels_train = []
    surprise_samples_test = []
    surprise_labels_test = []
    if len(surprise_data) > 0:
        surprise_samples_train, surprise_labels_train, surprise_samples_test, surprise_labels_test = split_data(surprise_data, percentaje)
  
    # disgust
    disgust_samples_train = []
    disgust_labels_train = []
    disgust_samples_test = []
    disgust_labels_test = []
    if len(disgust_data) > 0:
        disgust_samples_train, disgust_labels_train, disgust_samples_test, disgust_labels_test = split_data(disgust_data, percentaje)
    
    # angrer
    angry_samples_train = []
    angry_labels_train = []
    angry_samples_test = []
    angry_labels_test = []
    if len(angry_data) > 0:
        angry_samples_train, angry_labels_train, angry_samples_test, angry_labels_test = split_data(angry_data, percentaje)
    
   
    
    train_set = happy_samples_train + sad_samples_train + fear_samples_train + surprise_samples_train + disgust_samples_train + angry_samples_train
    test_set = happy_samples_test + sad_samples_test + fear_samples_test + surprise_samples_test + disgust_samples_test + angry_samples_test
    labels_train = happy_labels_train + sad_labels_train + fear_labels_train + surprise_labels_train + disgust_labels_train + angry_labels_train
    labels_test = happy_labels_test + sad_labels_test + fear_labels_test + surprise_labels_test + disgust_labels_test + angry_labels_test 
     

    # train_set, labels_train, test_set, labels_test = split_data(all_data, percentaje)    

    clf.fit(train_set, labels_train)
    gnb.fit(train_set, labels_train)
    mnb.fit(train_set, labels_train)
    
    y_pred_g = gnb.predict(test_set)
    y_pred_m = mnb.predict(test_set)
    predicted = clf.predict(test_set) 
    
    err1 = (labels_test == predicted).mean()
    err2 = (labels_test == y_pred_g).mean()
    err3 = (labels_test == y_pred_m).mean()
        
    print 'accuracy svm: %.2f %%' % (err1*100), 'accuracy gnb: %.2f %%' % (err2*100), 'accuracy mnb: %.2f %%' % (err3*100)

#     folder = '/Users/soledad/Box Sync/Fall 15/I590 - Collective Intelligence/CV Project/Code/Emotion_Out/'

    folder = '/Users/dhvanikotak/Box Sync/CV Project/Code/Emotion_Out/'

    outfile = open(folder + str(i)+'train_set.pkl', 'wb')
    np.save(outfile, train_set)
    outfile.close()
    
    outfile = open(folder + str(i)+'test_set.pkl', 'wb')
    np.save(outfile, test_set)
    outfile.close()
    
    outfile = open(folder + str(i)+'labels_train.pkl', 'wb')
    np.save(outfile, labels_train)
    outfile.close()
    
    outfile = open(folder + str(i)+'labels_test.pkl', 'wb')
    np.save(outfile, labels_test)
    outfile.close()

    # save the classifier
    with open(folder + str(i)+'svm.pkl', 'wb') as fid:
        cPickle.dump(clf, fid)  
    fid.close()
    
    with open(folder + str(i)+'mnb.pkl', 'wb') as fid:
        cPickle.dump(mnb, fid)  
    fid.close()
    
    with open(folder + str(i)+'gnb.pkl', 'wb') as fid:
        cPickle.dump(gnb, fid)  
    fid.close()
    



# In[ ]:




# In[ ]:

Download .txt

gitextract_ulhk858q/

├── Confusion_matrix.py
├── README.md
├── detect_faces.py
└── emotion_classification_videos_faces.py

Download .txt

SYMBOL INDEX (7 symbols across 3 files)

FILE: Confusion_matrix.py
  function plot_confusion_matrix (line 33) | def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):

FILE: detect_faces.py
  function detect (line 17) | def detect(img, cascade):
  function draw_rects (line 24) | def draw_rects(img, rects, color):

FILE: emotion_classification_videos_faces.py
  function split_data (line 16) | def split_data(data, percentaje):
  function draw_flow (line 34) | def draw_flow(img, flow, step=16):
  function calc_hist (line 48) | def calc_hist(flow):
  function process_video (line 66) | def process_video(fn, samples):

Download .json

Condensed preview — 4 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (22K chars).

[
  {
    "path": "Confusion_matrix.py",
    "chars": 1565,
    "preview": "\n# coding: utf-8\n\n# In[3]:\n\nimport cPickle as pickle\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom sklearn im"
  },
  {
    "path": "README.md",
    "chars": 6729,
    "preview": "# Emotion-Detection-in-Videos\nThe aim of this work is to recognize the six emotions (happiness, sadness, disgust, surpri"
  },
  {
    "path": "detect_faces.py",
    "chars": 2981,
    "preview": "#!/usr/bin/env python\n\nimport numpy as np\nimport cv2\nimport ntpath\nimport glob\n\n# local modules\nfrom video import create"
  },
  {
    "path": "emotion_classification_videos_faces.py",
    "chars": 9912,
    "preview": "\n# coding: utf-8\n\n# In[3]:\n\nimport numpy as np\nimport cv2\nimport glob\nfrom random import shuffle\nfrom sklearn import svm"
  }
]

About this extraction

This page contains the full source code of the dhvanikotak/Emotion-Detection-in-Videos GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 4 files (20.7 KB), approximately 5.4k tokens, and a symbol index with 7 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo