Full Code of yos1up/DNC for AI

master b574aba2c3fd cached

2 files

11.6 KB

3.4k tokens

12 symbols

1 requests

Download .txt

Repository: yos1up/DNC
Branch: master
Commit: b574aba2c3fd
Files: 2
Total size: 11.6 KB

Directory structure:
gitextract_7wu1ofyq/

├── README.md
└── main.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# Implementation of Differentiable Neural Computers (DNC) in Chainer

Differentiable Neural Computers (DNC) is a neural network architecture proposed by DeepMind in [their third paper on Nature](http://www.nature.com/articles/nature20101.epdf?author_access_token=ImTXBI8aWbYxYQ51Plys8NRgN0jAjWel9jnR3ZoTv0MggmpDmwljGswxVdeocYSurJ3hxupzWuRNeGvvXnoO8o4jTJcnAyhGuZzXJ1GEaD-Z7E6X_a9R-xqJ9TfJWBqz).
I have implemented DNC in [Chainer](http://chainer.org/), a flexible framework of neural networks developped by [Preferred Networks](https://www.preferred-networks.jp/en/).

# What is DNC ?
DNC is a newly proposed neural network. In their paper, DNC learns well in several complex tasks, including finding shortest path in a graph and solving a block puzzle game. It is expected to have the capacity to solve complex, structured tasks that are inaccessible to previous neural networks.

DNC consists of a RNN (recurrent neural network) and a "memory matrix", with some heads for reading and writing to it. The RNN can control the heads at will; it can manipulate the heads in a predetermined fashion to read out the content of the memory and write some data to the memory.

In each timestep, a vanilla RNN receives some external input and yields some output (and refreshes its internal state). In contrast, a RNN in a DNC recieves "data read by the read head at the previous timestep" together with external input, and yields "memory manipulation command" in addition to output data. In accordance with this command, the heads are moved, the memory content at the write head is edited, and the memory contents at the read heads are fetched. Fetched data is input to RNN at the next timestep (together with external input data).

A RNN in DNC learns so that it achieves appropriate input-output relationship in the situation that "the memory" --- a convenient tool to compute --- is given to use freely. How the RNN utilizes the tool depends on its learning.

Although "read-write memory" seems to be very special, it can be regarded as a form of internal state of a RNN(*); in other words, DNC is an RNN that has non-trivial internal-state dynamics like LSTM, but the dynamics are very complicated. This memory enables the RNN to perform complicated information processings. Moreover, equipped with "read-write memory", which is fairly convenient for every kind of information processing, I expect that DNC has high versatility --- to perform various types of tasks reasonably well.

Note that their [NTM (Neural Turing Machine)](https://arxiv.org/pdf/1410.5401v2.pdf) proposed in 2014 has similar structure to DNC. The difference between DNC and NTM is that DNC has more reasonable memory heads' movement. (For datails see the Methods in the DNC paper.)

(*): They call the memory as "external". They say that is because "The behaviour of the network is independent of the memory size as long as the memory is not filled to capacity".


# About my code

In my code, a very small-scale DNC learns a very easy "repeat after me" task. It seems to learn correctly without errors, but it does not necessarily mean that this program correctly performs DNC. If you have any comments about my code, please feel free to contact @yos1up (twitter).

The Supplementary Material of their paper is very useful to implement DNC. It contains ALL variables used in the model and ALL equations to construct the computational graph of the model in two pages. Most of the names of the variables shown in my code coincide with that in their paper.


================================================
FILE: main.py
================================================
import numpy as np
import math
import chainer
from chainer import functions as F
from chainer import links as L
from chainer import \
     cuda, gradient_check, optimizers, serializers, utils, \
     Chain, ChainList, Function, Link, Variable


def onehot(x,n):
    ret = np.zeros(n).astype(np.float32)
    ret[x] = 1.0
    return ret

def overlap(u, v): # u, v: (1 * -) Variable  -> (1 * 1) Variable
    denominator = F.sqrt(F.batch_l2_norm_squared(u) * F.batch_l2_norm_squared(v))
    if (np.array_equal(denominator.data, np.array([0]))):
        return F.matmul(u, F.transpose(v))
    return F.matmul(u, F.transpose(v)) / F.reshape(denominator,(1,1))


def C(M, k, beta):
    # (N * W), (1 * W), (1 * 1) -> (N * 1)
    # (not (N * W), ({R,1} * W), (1 * {R,1}) -> (N * {R,1}))
    W = M.data.shape[1]    
    ret_list = [0] * M.data.shape[0]
    for i in range(M.data.shape[0]):
        ret_list[i] = overlap(F.reshape(M[i,:], (1, W)), k) * beta # pick i-th row
    return F.transpose(F.softmax(F.transpose(F.concat(ret_list, 0)))) # concat vertically and calc softmax in each column



def u2a(u): # u, a: (N * 1) Variable
    N = len(u.data)
    phi = np.argsort(u.data.reshape(N)) # u.data[phi]: ascending
    a_list = [0] * N    
    cumprod = Variable(np.array([[1.0]]).astype(np.float32)) 
    for i in range(N):
        a_list[phi[i]] = cumprod * (1.0 - F.reshape(u[phi[i],0], (1,1)))
        cumprod *= F.reshape(u[phi[i],0], (1,1))
    return F.concat(a_list, 0) # concat vertically



class DeepLSTM(Chain): # too simple?
    def __init__(self, d_in, d_out):
        super(DeepLSTM, self).__init__(
            l1 = L.LSTM(d_in, d_out),
            l2 = L.Linear(d_out, d_out),)
    def __call__(self, x):
        self.x = x
        self.y = self.l2(self.l1(self.x))
        return self.y
    def reset_state(self):
        self.l1.reset_state()


    
class DNC(Chain):
    def __init__(self, X, Y, N, W, R):
        self.X = X # input dimension
        self.Y = Y # output dimension
        self.N = N # number of memory slot
        self.W = W # dimension of one memory slot
        self.R = R # number of read heads
        self.controller = DeepLSTM(W*R+X, Y+W*R+3*W+5*R+3)
        
        super(DNC, self).__init__(
            l_dl = self.controller,
            l_Wr = L.Linear(self.R * self.W, self.Y) # nobias=True ? 
            )# <question : should all learnable weights be here??>
        self.reset_state()
    def __call__(self, x):
        # <question : is batchsize>1 possible for RNN ? if No, I will implement calculations without batch dimension.>
        self.chi = F.concat((x, self.r))
        (self.nu, self.xi) = \
                  F.split_axis(self.l_dl(self.chi), [self.Y], 1)
        (self.kr, self.betar, self.kw, self.betaw,
         self.e, self.v, self.f, self.ga, self.gw, self.pi
         ) = F.split_axis(self.xi, np.cumsum(
             [self.W*self.R, self.R, self.W, 1, self.W, self.W, self.R, 1, 1]), 1)

        self.kr = F.reshape(self.kr, (self.R, self.W)) # R * W
        self.betar = 1 + F.softplus(self.betar) # 1 * R
        # self.kw: 1 * W
        self.betaw = 1 + F.softplus(self.betaw) # 1 * 1
        self.e = F.sigmoid(self.e) # 1 * W
        # self.v : 1 * W
        self.f = F.sigmoid(self.f) # 1 * R
        self.ga = F.sigmoid(self.ga) # 1 * 1
        self.gw = F.sigmoid(self.gw) # 1 * 1
        self.pi = F.softmax(F.reshape(self.pi, (self.R, 3))) # R * 3 (softmax for 3)

        # self.wr : N * R
        self.psi_mat = 1 - F.matmul(Variable(np.ones((self.N, 1)).astype(np.float32)), self.f) * self.wr # N * R
        self.psi = Variable(np.ones((self.N, 1)).astype(np.float32)) # N * 1
        for i in range(self.R):
            self.psi = self.psi * F.reshape(self.psi_mat[:,i],(self.N,1)) # N * 1

        # self.ww, self.u : N * 1
        self.u = (self.u + self.ww - (self.u * self.ww)) * self.psi
        
        self.a = u2a(self.u) # N * 1
        self.cw = C(self.M, self.kw, self.betaw) # N * 1
        self.ww = F.matmul(F.matmul(self.a, self.ga) + F.matmul(self.cw, 1.0 - self.ga), self.gw) # N * 1
        self.M = self.M * (np.ones((self.N, self.W)).astype(np.float32) - F.matmul(self.ww, self.e)) + F.matmul(self.ww, self.v) # N * W

        self.p = (1.0 - F.matmul(Variable(np.ones((self.N,1)).astype(np.float32)), F.reshape(F.sum(self.ww),(1,1)))) \
                  * self.p + self.ww # N * 1
        self.wwrep = F.matmul(self.ww, Variable(np.ones((1, self.N)).astype(np.float32))) # N * N
        self.L = (1.0 - self.wwrep - F.transpose(self.wwrep)) * self.L + F.matmul(self.ww, F.transpose(self.p)) # N * N
        self.L = self.L * (np.ones((self.N, self.N)) - np.eye(self.N)) # force L[i,i] == 0   

        self.fo = F.matmul(self.L, self.wr) # N * R
        self.ba = F.matmul(F.transpose(self.L), self.wr) # N * R

        self.cr_list = [0] * self.R
        for i in range(self.R):
            self.cr_list[i] = C(self.M, F.reshape(self.kr[i,:],(1, self.W)),
                                F.reshape(self.betar[0,i],(1, 1))) # N * 1
        self.cr = F.concat(self.cr_list) # N * R

        self.bacrfo = F.concat((F.reshape(F.transpose(self.ba),(self.R,self.N,1)),
                               F.reshape(F.transpose(self.cr),(self.R,self.N,1)),
                               F.reshape(F.transpose(self.fo) ,(self.R,self.N,1)),),2) # R * N * 3
        self.pi = F.reshape(self.pi, (self.R,3,1)) # R * 3 * 1
        self.wr = F.transpose(F.reshape(F.batch_matmul(self.bacrfo, self.pi), (self.R, self.N))) # N * R
            
        self.r = F.reshape(F.matmul(F.transpose(self.M), self.wr),(1, self.R * self.W)) # W * R (-> 1 * RW)
        
        self.y = self.l_Wr(self.r) + self.nu # 1 * Y
        return self.y
    def reset_state(self):
        self.l_dl.reset_state()
        self.u = Variable(np.zeros((self.N, 1)).astype(np.float32))
        self.p = Variable(np.zeros((self.N, 1)).astype(np.float32))
        self.L = Variable(np.zeros((self.N, self.N)).astype(np.float32))                           
        self.M = Variable(np.zeros((self.N, self.W)).astype(np.float32))
        self.r = Variable(np.zeros((1, self.R*self.W)).astype(np.float32))
        self.wr = Variable(np.zeros((self.N, self.R)).astype(np.float32))
        self.ww = Variable(np.zeros((self.N, 1)).astype(np.float32))
        # any variable else ?

X = 5
Y = 5
N = 10
W = 10
R = 2
mdl = DNC(X, Y, N, W, R)
opt = optimizers.Adam()
opt.setup(mdl)
datanum = 100000
loss = 0.0
acc = 0.0
for datacnt in range(datanum):
    lossfrac = np.zeros((1,2))
    # x_seq = np.random.rand(X,seqlen).astype(np.float32)
    # t_seq = np.random.rand(Y,seqlen).astype(np.float32)
    # t_seq = np.copy(x_seq)

    contentlen = np.random.randint(3,6)
    content = np.random.randint(0,X-1,contentlen)
    seqlen = contentlen + contentlen
    x_seq_list = [float('nan')] * seqlen
    t_seq_list = [float('nan')] * seqlen    
    for i in range(seqlen):
        if (i < contentlen):
            x_seq_list[i] = onehot(content[i],X)
        elif (i == contentlen):
            x_seq_list[i] = onehot(X-1,X)
        else:
            x_seq_list[i] = np.zeros(X).astype(np.float32)
            
        if (i >= contentlen):
            t_seq_list[i] = onehot(content[i-contentlen],X)    
    
    mdl.reset_state()
    for cnt in range(seqlen):
        x = Variable(x_seq_list[cnt].reshape(1,X))
        if (isinstance(t_seq_list[cnt], np.ndarray)):
            t = Variable(t_seq_list[cnt].reshape(1,Y))
        else:
            t = []
            
        y = mdl(x)
        if (isinstance(t,chainer.Variable)):
            loss += (y - t)**2
            print y.data, t.data, np.argmax(y.data)==np.argmax(t.data)
            if (np.argmax(y.data)==np.argmax(t.data)): acc += 1
        if (cnt+1==seqlen):
            mdl.cleargrads()
            loss.grad = np.ones(loss.data.shape, dtype=np.float32)
            loss.backward()
            opt.update()
            loss.unchain_backward()
            print '(', datacnt, ')', loss.data.sum()/loss.data.size/contentlen, acc/contentlen
            lossfrac += [loss.data.sum()/loss.data.size/seqlen, 1.]
            loss = 0.0
            acc = 0.0

Download .txt

gitextract_7wu1ofyq/

├── README.md
└── main.py

Download .txt

SYMBOL INDEX (12 symbols across 1 files)

FILE: main.py
  function onehot (line 11) | def onehot(x,n):
  function overlap (line 16) | def overlap(u, v): # u, v: (1 * -) Variable  -> (1 * 1) Variable
  function C (line 23) | def C(M, k, beta):
  function u2a (line 34) | def u2a(u): # u, a: (N * 1) Variable
  class DeepLSTM (line 46) | class DeepLSTM(Chain): # too simple?
    method __init__ (line 47) | def __init__(self, d_in, d_out):
    method __call__ (line 51) | def __call__(self, x):
    method reset_state (line 55) | def reset_state(self):
  class DNC (line 60) | class DNC(Chain):
    method __init__ (line 61) | def __init__(self, X, Y, N, W, R):
    method __call__ (line 74) | def __call__(self, x):
    method reset_state (line 134) | def reset_state(self):

Download .json

Condensed preview — 2 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (12K chars).

[
  {
    "path": "README.md",
    "chars": 3522,
    "preview": "# Implementation of Differentiable Neural Computers (DNC) in Chainer\n\nDifferentiable Neural Computers (DNC) is a neural "
  },
  {
    "path": "main.py",
    "chars": 8373,
    "preview": "import numpy as np\r\nimport math\r\nimport chainer\r\nfrom chainer import functions as F\r\nfrom chainer import links as L\r\nfro"
  }
]

About this extraction

This page contains the full source code of the yos1up/DNC GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 2 files (11.6 KB), approximately 3.4k tokens, and a symbol index with 12 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo