[
  {
    "path": "README.md",
    "content": "# FastGCN\nThis is the Tensorflow implementation of our ICLR2018 paper: [\"**FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling**\".](https://openreview.net/forum?id=rytstxWAW&noteId=ByU9EpGSf)\n\n\nInstructions of the sample codes:\n\n[For Reddit dataset]\n\n\ttrain_batch_multiRank_inductive_reddit_Mixlayers_sampleA.py is the final model. (precomputated the AH in the bottom layer) The original Reddit data should be transferred into the .npz format using this function: transferRedditDataFormat.\n\tNote: By default, this code does no sampling. To enable sampling, change `main(None)` at the bottom to `main(100)`. (The number is the sample size. You can also try other sample sizes)\n\n\ttrain_batch_multiRank_inductive_reddit_Mixlayers_uniform.py is the model for uniform sampling.\n\n\ttrain_batch_multiRank_inductive_reddit_Mixlayers_appr2layers.py is the model for 2-layer approximation.\n\n\tcreate_Graph_forGraphSAGE.py is used to transfer the data into the GraphSAGE format, so that users can compare our method with GraphSAGE. We also include the transferred original Cora dataset in this repository (./data/cora_graphSAGE).\n\n\n[For pubmed or cora]\n\n\ttrain.py is the original GCN model.\n\n \tpubmed_Mix_sampleA.py \tThe dataset could be defined in the codes, for example: flags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')\n\n\tpubmed_Mix_uniform.py and pubmed_inductive_appr2layers.py are similar to the ones for reddit.\n\n\tpubmed-original**.py means the codes are used for original Cora or Pubmed datasets. Users could also change their datasets by changing the data load function from load_data() to load_data_original().\n"
  },
  {
    "path": "__init__.py",
    "content": "from __future__ import print_function\nfrom __future__ import division\n"
  },
  {
    "path": "create_Graph.py",
    "content": "import numpy as np\nimport pickle as pkl\nimport scipy.sparse as sp\nimport sys\nimport os\nimport networkx as nx\nfrom utils import *\nimport json\nfrom networkx.readwrite import json_graph\n\n # 'cora', 'citeseer', 'pubmed'\n\nif __name__==\"__main__\":\n    data_name = 'cora'\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(data_name)\n\n    G = nx.from_scipy_sparse_matrix(adj)\n\n    val_index = np.where(val_mask)[0]\n    test_index = np.where(test_mask)[0]\n    y = y_train+y_val+y_test\n    y = np.argmax(y,axis=1)\n\n\n    for i in range(len(y)):\n        if i in val_index:\n            G.node[i]['val']=True\n            G.node[i]['test']=False\n        elif i in test_index:\n            G.node[i]['test']=True\n            G.node[i]['val']=False\n        else:\n            G.node[i]['test'] = False\n            G.node[i]['val'] = False\n\n\n    data = json_graph.node_link_data(G)\n    with open(\"cora/cora-G.json\",\"wb\") as f:\n        json.dump(data,f)\n    classMap = {}\n    idMap = {}\n    for i in range(len(y)):\n        classMap[i]=y[i]\n        idMap[i] = i\n    with open(\"cora/cora-id_map.json\",\"wb\") as f:\n        json.dump(idMap,f)\n    with open(\"cora/cora-class_map.json\",\"wb\") as f:\n        json.dump(classMap,f)\n    np.save(open(\"cora/cora-feats.npy\",\"wb\"), features.todense())\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "create_Graph_forGraphSAGE.py",
    "content": "import numpy as np\nimport pickle as pkl\nimport scipy.sparse as sp\nimport sys\nimport os\nimport networkx as nx\nfrom utils import *\nimport json\nfrom networkx.readwrite import json_graph\n\n # 'cora', 'citeseer', 'pubmed'\n\nif __name__==\"__main__\":\n    data_name = 'cora'\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data_original(data_name)\n\n    G = nx.from_scipy_sparse_matrix(adj)\n    train_index = np.where(train_mask)[0]\n    val_index = np.where(val_mask)[0]\n    test_index = np.where(test_mask)[0]\n    y = y_train+y_val+y_test\n    y = np.argmax(y,axis=1)\n\n    train_attr, val_attr, test_attr = ({i: bool(m) for i, m in enumerate(mask)} for mask in (train_mask, val_mask, test_mask))\n\n    nx.set_node_attributes(G, train_attr, 'train')\n    nx.set_node_attributes(G, val_attr, 'val')\n    nx.set_node_attributes(G, test_attr, 'test')\n    \n    data = json_graph.node_link_data(G)\n    with open(\"%s/%s0-G.json\" % (data_name, data_name), \"wb\") as f:\n        json.dump(data,f)\n    classMap = {}\n    idMap = {}\n    for i in range(len(y)):\n        classMap[i]=y[i]\n        idMap[i] = i\n\n    with open(\"%s/%s0-id_map.json\" % (data_name, data_name), \"wb\") as f:\n        json.dump(idMap,f)\n    with open(\"%s/%s0-class_map.json\" % (data_name, data_name), \"wb\") as f:\n        json.dump(classMap,f)\n\n    np.save(\"%s/%s0-feats.npy\" % (data_name, data_name), features.todense())\n"
  },
  {
    "path": "data/ind.citeseer.test.index",
    "content": "2488\n2644\n3261\n2804\n3176\n2432\n3310\n2410\n2812\n2520\n2994\n3282\n2680\n2848\n2670\n3005\n2977\n2592\n2967\n2461\n3184\n2852\n2768\n2905\n2851\n3129\n3164\n2438\n2793\n2763\n2528\n2954\n2347\n2640\n3265\n2874\n2446\n2856\n3149\n2374\n3097\n3301\n2664\n2418\n2655\n2464\n2596\n3262\n3278\n2320\n2612\n2614\n2550\n2626\n2772\n3007\n2733\n2516\n2476\n2798\n2561\n2839\n2685\n2391\n2705\n3098\n2754\n3251\n2767\n2630\n2727\n2513\n2701\n3264\n2792\n2821\n3260\n2462\n3307\n2639\n2900\n3060\n2672\n3116\n2731\n3316\n2386\n2425\n2518\n3151\n2586\n2797\n2479\n3117\n2580\n3182\n2459\n2508\n3052\n3230\n3215\n2803\n2969\n2562\n2398\n3325\n2343\n3030\n2414\n2776\n2383\n3173\n2850\n2499\n3312\n2648\n2784\n2898\n3056\n2484\n3179\n3132\n2577\n2563\n2867\n3317\n2355\n3207\n3178\n2968\n3319\n2358\n2764\n3001\n2683\n3271\n2321\n2567\n2502\n3246\n2715\n3066\n2390\n2381\n3162\n2741\n2498\n2790\n3038\n3321\n2481\n3050\n3161\n3122\n2801\n2957\n3177\n2965\n2621\n3208\n2921\n2802\n2357\n2677\n2519\n2860\n2696\n2368\n3241\n2858\n2419\n2762\n2875\n3222\n3064\n2827\n3044\n2471\n3062\n2982\n2736\n2322\n2709\n2766\n2424\n2602\n2970\n2675\n3299\n2554\n2964\n2597\n2753\n2979\n2523\n2912\n2896\n2317\n3167\n2813\n2482\n2557\n3043\n3244\n2985\n2460\n2363\n3272\n3045\n3192\n2453\n2656\n2834\n2443\n3202\n2926\n2711\n2633\n2384\n2752\n3285\n2817\n2483\n2919\n2924\n2661\n2698\n2361\n2662\n2819\n3143\n2316\n3196\n2739\n2345\n2578\n2822\n3229\n2908\n2917\n2692\n3200\n2324\n2522\n3322\n2697\n3163\n3093\n3233\n2774\n2371\n2835\n2652\n2539\n2843\n3231\n2976\n2429\n2367\n3144\n2564\n3283\n3217\n3035\n2962\n2433\n2415\n2387\n3021\n2595\n2517\n2468\n3061\n2673\n2348\n3027\n2467\n3318\n2959\n3273\n2392\n2779\n2678\n3004\n2634\n2974\n3198\n2342\n2376\n3249\n2868\n2952\n2710\n2838\n2335\n2524\n2650\n3186\n2743\n2545\n2841\n2515\n2505\n3181\n2945\n2738\n2933\n3303\n2611\n3090\n2328\n3010\n3016\n2504\n2936\n3266\n3253\n2840\n3034\n2581\n2344\n2452\n2654\n3199\n3137\n2514\n2394\n2544\n2641\n2613\n2618\n2558\n2593\n2532\n2512\n2975\n3267\n2566\n2951\n3300\n2869\n2629\n2747\n3055\n2831\n3105\n3168\n3100\n2431\n2828\n2684\n3269\n2910\n2865\n2693\n2884\n3228\n2783\n3247\n2770\n3157\n2421\n2382\n2331\n3203\n3240\n2351\n3114\n2986\n2688\n2439\n2996\n3079\n3103\n3296\n2349\n2372\n3096\n2422\n2551\n3069\n2737\n3084\n3304\n3022\n2542\n3204\n2949\n2318\n2450\n3140\n2734\n2881\n2576\n3054\n3089\n3125\n2761\n3136\n3111\n2427\n2466\n3101\n3104\n3259\n2534\n2961\n3191\n3000\n3036\n2356\n2800\n3155\n3224\n2646\n2735\n3020\n2866\n2426\n2448\n3226\n3219\n2749\n3183\n2906\n2360\n2440\n2946\n2313\n2859\n2340\n3008\n2719\n3058\n2653\n3023\n2888\n3243\n2913\n3242\n3067\n2409\n3227\n2380\n2353\n2686\n2971\n2847\n2947\n2857\n3263\n3218\n2861\n3323\n2635\n2966\n2604\n2456\n2832\n2694\n3245\n3119\n2942\n3153\n2894\n2555\n3128\n2703\n2323\n2631\n2732\n2699\n2314\n2590\n3127\n2891\n2873\n2814\n2326\n3026\n3288\n3095\n2706\n2457\n2377\n2620\n2526\n2674\n3190\n2923\n3032\n2334\n3254\n2991\n3277\n2973\n2599\n2658\n2636\n2826\n3148\n2958\n3258\n2990\n3180\n2538\n2748\n2625\n2565\n3011\n3057\n2354\n3158\n2622\n3308\n2983\n2560\n3169\n3059\n2480\n3194\n3291\n3216\n2643\n3172\n2352\n2724\n2485\n2411\n2948\n2445\n2362\n2668\n3275\n3107\n2496\n2529\n2700\n2541\n3028\n2879\n2660\n3324\n2755\n2436\n3048\n2623\n2920\n3040\n2568\n3221\n3003\n3295\n2473\n3232\n3213\n2823\n2897\n2573\n2645\n3018\n3326\n2795\n2915\n3109\n3086\n2463\n3118\n2671\n2909\n2393\n2325\n3029\n2972\n3110\n2870\n3284\n2816\n2647\n2667\n2955\n2333\n2960\n2864\n2893\n2458\n2441\n2359\n2327\n3256\n3099\n3073\n3138\n2511\n2666\n2548\n2364\n2451\n2911\n3237\n3206\n3080\n3279\n2934\n2981\n2878\n3130\n2830\n3091\n2659\n2449\n3152\n2413\n2722\n2796\n3220\n2751\n2935\n3238\n2491\n2730\n2842\n3223\n2492\n3074\n3094\n2833\n2521\n2883\n3315\n2845\n2907\n3083\n2572\n3092\n2903\n2918\n3039\n3286\n2587\n3068\n2338\n3166\n3134\n2455\n2497\n2992\n2775\n2681\n2430\n2932\n2931\n2434\n3154\n3046\n2598\n2366\n3015\n3147\n2944\n2582\n3274\n2987\n2642\n2547\n2420\n2930\n2750\n2417\n2808\n3141\n2997\n2995\n2584\n2312\n3033\n3070\n3065\n2509\n3314\n2396\n2543\n2423\n3170\n2389\n3289\n2728\n2540\n2437\n2486\n2895\n3017\n2853\n2406\n2346\n2877\n2472\n3210\n2637\n2927\n2789\n2330\n3088\n3102\n2616\n3081\n2902\n3205\n3320\n3165\n2984\n3185\n2707\n3255\n2583\n2773\n2742\n3024\n2402\n2718\n2882\n2575\n3281\n2786\n2855\n3014\n2401\n2535\n2687\n2495\n3113\n2609\n2559\n2665\n2530\n3293\n2399\n2605\n2690\n3133\n2799\n2533\n2695\n2713\n2886\n2691\n2549\n3077\n3002\n3049\n3051\n3087\n2444\n3085\n3135\n2702\n3211\n3108\n2501\n2769\n3290\n2465\n3025\n3019\n2385\n2940\n2657\n2610\n2525\n2941\n3078\n2341\n2916\n2956\n2375\n2880\n3009\n2780\n2370\n2925\n2332\n3146\n2315\n2809\n3145\n3106\n2782\n2760\n2493\n2765\n2556\n2890\n2400\n2339\n3201\n2818\n3248\n3280\n2570\n2569\n2937\n3174\n2836\n2708\n2820\n3195\n2617\n3197\n2319\n2744\n2615\n2825\n2603\n2914\n2531\n3193\n2624\n2365\n2810\n3239\n3159\n2537\n2844\n2758\n2938\n3037\n2503\n3297\n2885\n2608\n2494\n2712\n2408\n2901\n2704\n2536\n2373\n2478\n2723\n3076\n2627\n2369\n2669\n3006\n2628\n2788\n3276\n2435\n3139\n3235\n2527\n2571\n2815\n2442\n2892\n2978\n2746\n3150\n2574\n2725\n3188\n2601\n2378\n3075\n2632\n2794\n3270\n3071\n2506\n3126\n3236\n3257\n2824\n2989\n2950\n2428\n2405\n3156\n2447\n2787\n2805\n2720\n2403\n2811\n2329\n2474\n2785\n2350\n2507\n2416\n3112\n2475\n2876\n2585\n2487\n3072\n3082\n2943\n2757\n2388\n2600\n3294\n2756\n3142\n3041\n2594\n2998\n3047\n2379\n2980\n2454\n2862\n3175\n2588\n3031\n3012\n2889\n2500\n2791\n2854\n2619\n2395\n2807\n2740\n2412\n3131\n3013\n2939\n2651\n2490\n2988\n2863\n3225\n2745\n2714\n3160\n3124\n2849\n2676\n2872\n3287\n3189\n2716\n3115\n2928\n2871\n2591\n2717\n2546\n2777\n3298\n2397\n3187\n2726\n2336\n3268\n2477\n2904\n2846\n3121\n2899\n2510\n2806\n2963\n3313\n2679\n3302\n2663\n3053\n2469\n2999\n3311\n2470\n2638\n3120\n3171\n2689\n2922\n2607\n2721\n2993\n2887\n2837\n2929\n2829\n3234\n2649\n2337\n2759\n2778\n2771\n2404\n2589\n3123\n3209\n2729\n3252\n2606\n2579\n2552\n"
  },
  {
    "path": "data/ind.cora.test.index",
    "content": "2692\n2532\n2050\n1715\n2362\n2609\n2622\n1975\n2081\n1767\n2263\n1725\n2588\n2259\n2357\n1998\n2574\n2179\n2291\n2382\n1812\n1751\n2422\n1937\n2631\n2510\n2378\n2589\n2345\n1943\n1850\n2298\n1825\n2035\n2507\n2313\n1906\n1797\n2023\n2159\n2495\n1886\n2122\n2369\n2461\n1925\n2565\n1858\n2234\n2000\n1846\n2318\n1723\n2559\n2258\n1763\n1991\n1922\n2003\n2662\n2250\n2064\n2529\n1888\n2499\n2454\n2320\n2287\n2203\n2018\n2002\n2632\n2554\n2314\n2537\n1760\n2088\n2086\n2218\n2605\n1953\n2403\n1920\n2015\n2335\n2535\n1837\n2009\n1905\n2636\n1942\n2193\n2576\n2373\n1873\n2463\n2509\n1954\n2656\n2455\n2494\n2295\n2114\n2561\n2176\n2275\n2635\n2442\n2704\n2127\n2085\n2214\n2487\n1739\n2543\n1783\n2485\n2262\n2472\n2326\n1738\n2170\n2100\n2384\n2152\n2647\n2693\n2376\n1775\n1726\n2476\n2195\n1773\n1793\n2194\n2581\n1854\n2524\n1945\n1781\n1987\n2599\n1744\n2225\n2300\n1928\n2042\n2202\n1958\n1816\n1916\n2679\n2190\n1733\n2034\n2643\n2177\n1883\n1917\n1996\n2491\n2268\n2231\n2471\n1919\n1909\n2012\n2522\n1865\n2466\n2469\n2087\n2584\n2563\n1924\n2143\n1736\n1966\n2533\n2490\n2630\n1973\n2568\n1978\n2664\n2633\n2312\n2178\n1754\n2307\n2480\n1960\n1742\n1962\n2160\n2070\n2553\n2433\n1768\n2659\n2379\n2271\n1776\n2153\n1877\n2027\n2028\n2155\n2196\n2483\n2026\n2158\n2407\n1821\n2131\n2676\n2277\n2489\n2424\n1963\n1808\n1859\n2597\n2548\n2368\n1817\n2405\n2413\n2603\n2350\n2118\n2329\n1969\n2577\n2475\n2467\n2425\n1769\n2092\n2044\n2586\n2608\n1983\n2109\n2649\n1964\n2144\n1902\n2411\n2508\n2360\n1721\n2005\n2014\n2308\n2646\n1949\n1830\n2212\n2596\n1832\n1735\n1866\n2695\n1941\n2546\n2498\n2686\n2665\n1784\n2613\n1970\n2021\n2211\n2516\n2185\n2479\n2699\n2150\n1990\n2063\n2075\n1979\n2094\n1787\n2571\n2690\n1926\n2341\n2566\n1957\n1709\n1955\n2570\n2387\n1811\n2025\n2447\n2696\n2052\n2366\n1857\n2273\n2245\n2672\n2133\n2421\n1929\n2125\n2319\n2641\n2167\n2418\n1765\n1761\n1828\n2188\n1972\n1997\n2419\n2289\n2296\n2587\n2051\n2440\n2053\n2191\n1923\n2164\n1861\n2339\n2333\n2523\n2670\n2121\n1921\n1724\n2253\n2374\n1940\n2545\n2301\n2244\n2156\n1849\n2551\n2011\n2279\n2572\n1757\n2400\n2569\n2072\n2526\n2173\n2069\n2036\n1819\n1734\n1880\n2137\n2408\n2226\n2604\n1771\n2698\n2187\n2060\n1756\n2201\n2066\n2439\n1844\n1772\n2383\n2398\n1708\n1992\n1959\n1794\n2426\n2702\n2444\n1944\n1829\n2660\n2497\n2607\n2343\n1730\n2624\n1790\n1935\n1967\n2401\n2255\n2355\n2348\n1931\n2183\n2161\n2701\n1948\n2501\n2192\n2404\n2209\n2331\n1810\n2363\n2334\n1887\n2393\n2557\n1719\n1732\n1986\n2037\n2056\n1867\n2126\n1932\n2117\n1807\n1801\n1743\n2041\n1843\n2388\n2221\n1833\n2677\n1778\n2661\n2306\n2394\n2106\n2430\n2371\n2606\n2353\n2269\n2317\n2645\n2372\n2550\n2043\n1968\n2165\n2310\n1985\n2446\n1982\n2377\n2207\n1818\n1913\n1766\n1722\n1894\n2020\n1881\n2621\n2409\n2261\n2458\n2096\n1712\n2594\n2293\n2048\n2359\n1839\n2392\n2254\n1911\n2101\n2367\n1889\n1753\n2555\n2246\n2264\n2010\n2336\n2651\n2017\n2140\n1842\n2019\n1890\n2525\n2134\n2492\n2652\n2040\n2145\n2575\n2166\n1999\n2434\n1711\n2276\n2450\n2389\n2669\n2595\n1814\n2039\n2502\n1896\n2168\n2344\n2637\n2031\n1977\n2380\n1936\n2047\n2460\n2102\n1745\n2650\n2046\n2514\n1980\n2352\n2113\n1713\n2058\n2558\n1718\n1864\n1876\n2338\n1879\n1891\n2186\n2451\n2181\n2638\n2644\n2103\n2591\n2266\n2468\n1869\n2582\n2674\n2361\n2462\n1748\n2215\n2615\n2236\n2248\n2493\n2342\n2449\n2274\n1824\n1852\n1870\n2441\n2356\n1835\n2694\n2602\n2685\n1893\n2544\n2536\n1994\n1853\n1838\n1786\n1930\n2539\n1892\n2265\n2618\n2486\n2583\n2061\n1796\n1806\n2084\n1933\n2095\n2136\n2078\n1884\n2438\n2286\n2138\n1750\n2184\n1799\n2278\n2410\n2642\n2435\n1956\n2399\n1774\n2129\n1898\n1823\n1938\n2299\n1862\n2420\n2673\n1984\n2204\n1717\n2074\n2213\n2436\n2297\n2592\n2667\n2703\n2511\n1779\n1782\n2625\n2365\n2315\n2381\n1788\n1714\n2302\n1927\n2325\n2506\n2169\n2328\n2629\n2128\n2655\n2282\n2073\n2395\n2247\n2521\n2260\n1868\n1988\n2324\n2705\n2541\n1731\n2681\n2707\n2465\n1785\n2149\n2045\n2505\n2611\n2217\n2180\n1904\n2453\n2484\n1871\n2309\n2349\n2482\n2004\n1965\n2406\n2162\n1805\n2654\n2007\n1947\n1981\n2112\n2141\n1720\n1758\n2080\n2330\n2030\n2432\n2089\n2547\n1820\n1815\n2675\n1840\n2658\n2370\n2251\n1908\n2029\n2068\n2513\n2549\n2267\n2580\n2327\n2351\n2111\n2022\n2321\n2614\n2252\n2104\n1822\n2552\n2243\n1798\n2396\n2663\n2564\n2148\n2562\n2684\n2001\n2151\n2706\n2240\n2474\n2303\n2634\n2680\n2055\n2090\n2503\n2347\n2402\n2238\n1950\n2054\n2016\n1872\n2233\n1710\n2032\n2540\n2628\n1795\n2616\n1903\n2531\n2567\n1946\n1897\n2222\n2227\n2627\n1856\n2464\n2241\n2481\n2130\n2311\n2083\n2223\n2284\n2235\n2097\n1752\n2515\n2527\n2385\n2189\n2283\n2182\n2079\n2375\n2174\n2437\n1993\n2517\n2443\n2224\n2648\n2171\n2290\n2542\n2038\n1855\n1831\n1759\n1848\n2445\n1827\n2429\n2205\n2598\n2657\n1728\n2065\n1918\n2427\n2573\n2620\n2292\n1777\n2008\n1875\n2288\n2256\n2033\n2470\n2585\n2610\n2082\n2230\n1915\n1847\n2337\n2512\n2386\n2006\n2653\n2346\n1951\n2110\n2639\n2520\n1939\n2683\n2139\n2220\n1910\n2237\n1900\n1836\n2197\n1716\n1860\n2077\n2519\n2538\n2323\n1914\n1971\n1845\n2132\n1802\n1907\n2640\n2496\n2281\n2198\n2416\n2285\n1755\n2431\n2071\n2249\n2123\n1727\n2459\n2304\n2199\n1791\n1809\n1780\n2210\n2417\n1874\n1878\n2116\n1961\n1863\n2579\n2477\n2228\n2332\n2578\n2457\n2024\n1934\n2316\n1841\n1764\n1737\n2322\n2239\n2294\n1729\n2488\n1974\n2473\n2098\n2612\n1834\n2340\n2423\n2175\n2280\n2617\n2208\n2560\n1741\n2600\n2059\n1747\n2242\n2700\n2232\n2057\n2147\n2682\n1792\n1826\n2120\n1895\n2364\n2163\n1851\n2391\n2414\n2452\n1803\n1989\n2623\n2200\n2528\n2415\n1804\n2146\n2619\n2687\n1762\n2172\n2270\n2678\n2593\n2448\n1882\n2257\n2500\n1899\n2478\n2412\n2107\n1746\n2428\n2115\n1800\n1901\n2397\n2530\n1912\n2108\n2206\n2091\n1740\n2219\n1976\n2099\n2142\n2671\n2668\n2216\n2272\n2229\n2666\n2456\n2534\n2697\n2688\n2062\n2691\n2689\n2154\n2590\n2626\n2390\n1813\n2067\n1952\n2518\n2358\n1789\n2076\n2049\n2119\n2013\n2124\n2556\n2105\n2093\n1885\n2305\n2354\n2135\n2601\n1770\n1995\n2504\n1749\n2157\n"
  },
  {
    "path": "data/ind.pubmed.test.index",
    "content": "18747\n19392\n19181\n18843\n19221\n18962\n19560\n19097\n18966\n19014\n18756\n19313\n19000\n19569\n19359\n18854\n18970\n19073\n19661\n19180\n19377\n18750\n19401\n18788\n19224\n19447\n19017\n19241\n18890\n18908\n18965\n19001\n18849\n19641\n18852\n19222\n19172\n18762\n19156\n19162\n18856\n18763\n19318\n18826\n19712\n19192\n19695\n19030\n19523\n19249\n19079\n19232\n19455\n18743\n18800\n19071\n18885\n19593\n19394\n19390\n18832\n19445\n18838\n19632\n19548\n19546\n18825\n19498\n19266\n19117\n19595\n19252\n18730\n18913\n18809\n19452\n19520\n19274\n19555\n19388\n18919\n19099\n19637\n19403\n18720\n19526\n18905\n19451\n19408\n18923\n18794\n19322\n19431\n18912\n18841\n19239\n19125\n19258\n19565\n18898\n19482\n19029\n18778\n19096\n19684\n19552\n18765\n19361\n19171\n19367\n19623\n19402\n19327\n19118\n18888\n18726\n19510\n18831\n19490\n19576\n19050\n18729\n18896\n19246\n19012\n18862\n18873\n19193\n19693\n19474\n18953\n19115\n19182\n19269\n19116\n18837\n18872\n19007\n19212\n18798\n19102\n18772\n19660\n19511\n18914\n18886\n19672\n19360\n19213\n18810\n19420\n19512\n18719\n19432\n19350\n19127\n18782\n19587\n18924\n19488\n18781\n19340\n19190\n19383\n19094\n18835\n19487\n19230\n18791\n18882\n18937\n18928\n18755\n18802\n19516\n18795\n18786\n19273\n19349\n19398\n19626\n19130\n19351\n19489\n19446\n18959\n19025\n18792\n18878\n19304\n19629\n19061\n18785\n19194\n19179\n19210\n19417\n19583\n19415\n19443\n18739\n19662\n18904\n18910\n18901\n18960\n18722\n18827\n19290\n18842\n19389\n19344\n18961\n19098\n19147\n19334\n19358\n18829\n18984\n18931\n18742\n19320\n19111\n19196\n18887\n18991\n19469\n18990\n18876\n19261\n19270\n19522\n19088\n19284\n19646\n19493\n19225\n19615\n19449\n19043\n19674\n19391\n18918\n19155\n19110\n18815\n19131\n18834\n19715\n19603\n19688\n19133\n19053\n19166\n19066\n18893\n18757\n19582\n19282\n19257\n18869\n19467\n18954\n19371\n19151\n19462\n19598\n19653\n19187\n19624\n19564\n19534\n19581\n19478\n18985\n18746\n19342\n18777\n19696\n18824\n19138\n18728\n19643\n19199\n18731\n19168\n18948\n19216\n19697\n19347\n18808\n18725\n19134\n18847\n18828\n18996\n19106\n19485\n18917\n18911\n18776\n19203\n19158\n18895\n19165\n19382\n18780\n18836\n19373\n19659\n18947\n19375\n19299\n18761\n19366\n18754\n19248\n19416\n19658\n19638\n19034\n19281\n18844\n18922\n19491\n19272\n19341\n19068\n19332\n19559\n19293\n18804\n18933\n18935\n19405\n18936\n18945\n18943\n18818\n18797\n19570\n19464\n19428\n19093\n19433\n18986\n19161\n19255\n19157\n19046\n19292\n19434\n19298\n18724\n19410\n19694\n19214\n19640\n19189\n18963\n19218\n19585\n19041\n19550\n19123\n19620\n19376\n19561\n18944\n19706\n19056\n19283\n18741\n19319\n19144\n19542\n18821\n19404\n19080\n19303\n18793\n19306\n19678\n19435\n19519\n19566\n19278\n18946\n19536\n19020\n19057\n19198\n19333\n19649\n19699\n19399\n19654\n19136\n19465\n19321\n19577\n18907\n19665\n19386\n19596\n19247\n19473\n19568\n19355\n18925\n19586\n18982\n19616\n19495\n19612\n19023\n19438\n18817\n19692\n19295\n19414\n19676\n19472\n19107\n19062\n19035\n18883\n19409\n19052\n19606\n19091\n19651\n19475\n19413\n18796\n19369\n19639\n19701\n19461\n19645\n19251\n19063\n19679\n19545\n19081\n19363\n18995\n19549\n18790\n18855\n18833\n18899\n19395\n18717\n19647\n18768\n19103\n19245\n18819\n18779\n19656\n19076\n18745\n18971\n19197\n19711\n19074\n19128\n19466\n19139\n19309\n19324\n18814\n19092\n19627\n19060\n18806\n18929\n18737\n18942\n18906\n18858\n19456\n19253\n19716\n19104\n19667\n19574\n18903\n19237\n18864\n19556\n19364\n18952\n19008\n19323\n19700\n19170\n19267\n19345\n19238\n18909\n18892\n19109\n19704\n18902\n19275\n19680\n18723\n19242\n19112\n19169\n18956\n19343\n19650\n19541\n19698\n19521\n19087\n18976\n19038\n18775\n18968\n19671\n19412\n19407\n19573\n19027\n18813\n19357\n19460\n19673\n19481\n19036\n19614\n18787\n19195\n18732\n18884\n19613\n19657\n19575\n19226\n19589\n19234\n19617\n19707\n19484\n18740\n19424\n18784\n19419\n19159\n18865\n19105\n19315\n19480\n19664\n19378\n18803\n19605\n18870\n19042\n19426\n18848\n19223\n19509\n19532\n18752\n19691\n18718\n19209\n19362\n19090\n19492\n19567\n19687\n19018\n18830\n19530\n19554\n19119\n19442\n19558\n19527\n19427\n19291\n19543\n19422\n19142\n18897\n18950\n19425\n19002\n19588\n18978\n19551\n18930\n18736\n19101\n19215\n19150\n19263\n18949\n18974\n18759\n19335\n19200\n19129\n19328\n19437\n18988\n19429\n19368\n19406\n19049\n18811\n19296\n19256\n19385\n19602\n18770\n19337\n19580\n19476\n19045\n19132\n19089\n19120\n19265\n19483\n18767\n19227\n18934\n19069\n18820\n19006\n19459\n18927\n19037\n19280\n19441\n18823\n19015\n19114\n19618\n18957\n19176\n18853\n19648\n19201\n19444\n19279\n18751\n19302\n19505\n18733\n19601\n19533\n18863\n19708\n19387\n19346\n19152\n19206\n18851\n19338\n19681\n19380\n19055\n18766\n19085\n19591\n19547\n18958\n19146\n18840\n19051\n19021\n19207\n19235\n19086\n18979\n19300\n18939\n19100\n19619\n19287\n18980\n19277\n19326\n19108\n18920\n19625\n19374\n19078\n18734\n19634\n19339\n18877\n19423\n19652\n19683\n19044\n18983\n19330\n19529\n19714\n19468\n19075\n19540\n18839\n19022\n19286\n19537\n19175\n19463\n19167\n19705\n19562\n19244\n19486\n19611\n18801\n19178\n19590\n18846\n19450\n19205\n19381\n18941\n19670\n19185\n19504\n19633\n18997\n19113\n19397\n19636\n19709\n19289\n19264\n19353\n19584\n19126\n18938\n19669\n18964\n19276\n18774\n19173\n19231\n18973\n18769\n19064\n19040\n19668\n18738\n19082\n19655\n19236\n19352\n19609\n19628\n18951\n19384\n19122\n18875\n18992\n18753\n19379\n19254\n19301\n19506\n19135\n19010\n19682\n19400\n19579\n19316\n19553\n19208\n19635\n19644\n18891\n19024\n18989\n19250\n18850\n19317\n18915\n19607\n18799\n18881\n19479\n19031\n19365\n19164\n18744\n18760\n19502\n19058\n19517\n18735\n19448\n19243\n19453\n19285\n18857\n19439\n19016\n18975\n19503\n18998\n18981\n19186\n18994\n19240\n19631\n19070\n19174\n18900\n19065\n19220\n19229\n18880\n19308\n19372\n19496\n18771\n19325\n19538\n19033\n18874\n19077\n19211\n18764\n19458\n19571\n19121\n19019\n19059\n19497\n18969\n19666\n19297\n19219\n19622\n19184\n18977\n19702\n19539\n19329\n19095\n19675\n18972\n19514\n19703\n19188\n18866\n18812\n19314\n18822\n18845\n19494\n19411\n18916\n19686\n18967\n19294\n19143\n19204\n18805\n19689\n19233\n18758\n18748\n19011\n19685\n19336\n19608\n19454\n19124\n18868\n18807\n19544\n19621\n19228\n19154\n19141\n19145\n19153\n18860\n19163\n19393\n19268\n19160\n19305\n19259\n19471\n19524\n18783\n19396\n18894\n19430\n19690\n19348\n19597\n19592\n19677\n18889\n19331\n18773\n19137\n19009\n18932\n19599\n18816\n19054\n19067\n19477\n19191\n18921\n18940\n19578\n19183\n19004\n19072\n19710\n19005\n19610\n18955\n19457\n19148\n18859\n18993\n19642\n19047\n19418\n19535\n19600\n19312\n19039\n19028\n18879\n19003\n19026\n19013\n19149\n19177\n19217\n18987\n19354\n19525\n19202\n19084\n19032\n18749\n18867\n19048\n18999\n19260\n19630\n18727\n19356\n19083\n18926\n18789\n19370\n18861\n19311\n19557\n19531\n19436\n19140\n19310\n19501\n18721\n19604\n19713\n19262\n19563\n19507\n19440\n19572\n19513\n19515\n19518\n19421\n19470\n19499\n19663\n19508\n18871\n19528\n19500\n19307\n19288\n19594\n19271\n"
  },
  {
    "path": "inits.py",
    "content": "import tensorflow as tf\nimport numpy as np\n\n\ndef uniform(shape, scale=0.05, name=None):\n    \"\"\"Uniform init.\"\"\"\n    initial = tf.random_uniform(shape, minval=-scale, maxval=scale, dtype=tf.float32)\n    return tf.Variable(initial, name=name)\n\n\ndef glorot(shape, name=None):\n    \"\"\"Glorot & Bengio (AISTATS 2010) init.\"\"\"\n    init_range = np.sqrt(6.0/(shape[0]+shape[1]))\n    initial = tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=tf.float32)\n    return tf.Variable(initial, name=name)\n\n\ndef zeros(shape, name=None):\n    \"\"\"All zeros.\"\"\"\n    initial = tf.zeros(shape, dtype=tf.float32)\n    return tf.Variable(initial, name=name)\n\n\ndef ones(shape, name=None):\n    \"\"\"All ones.\"\"\"\n    initial = tf.ones(shape, dtype=tf.float32)\n    return tf.Variable(initial, name=name)"
  },
  {
    "path": "lanczos.py",
    "content": "import numpy as np\nfrom numpy.linalg import norm\nfrom utils import load_data as dataload\nimport scipy.sparse as sparse\nimport pickle\nfrom scipy.linalg import qr, svd\n\ndef lanczos(A,k,q):\n    n = A.shape[0]\n    Q = np.zeros((n,k+1))\n\n    Q[:,0] = q/norm(q)\n\n    alpha = 0\n    beta = 0\n\n    for i in range(k):\n      if i == 0:\n        q = np.dot(A,Q[:,i])\n      else:\n        q = np.dot(A, Q[:,i]) - beta*Q[:,i-1]\n      alpha = np.dot(q.T, Q[:,i])\n      q = q - Q[:,i]*alpha\n      q = q - np.dot(Q[:,:i], np.dot(Q[:,:i].T, q)) # full reorthogonalization\n      beta = norm(q)\n      Q[:,i+1] = q/beta\n      print(i)\n\n    Q = Q[:,:k]\n\n    Sigma = np.dot(Q.T, np.dot(A, Q))\n    # A2 = np.dot(Q[:,:k], np.dot(Sigma[:k,:k], Q[:,:k].T))\n    # return A2\n    return Q, Sigma\n\ndef dense_RandomSVD(A,K):\n    G = np.random.randn(A.shape[0],K)\n    B = np.dot(A,G)\n    Q,R =qr(B,mode='economic')\n    M = np.dot(np.dot(Q, np.dot(np.dot(Q.T, A),Q)),Q.T)\n    return M\n\n\nif __name__==\"__main__\":\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = dataload('cora')\n    print(adj.shape)\n    adj = np.array(sparse.csr_matrix.todense(adj))\n    # np.save(\"ADJ_cora.npy\",adj)\n    q = np.random.randn(adj.shape[0],)\n    Q, sigma = lanczos(adj,100,q)\n    r = 100\n    A2 = np.dot(Q[:,:r], np.dot(sigma[:r,:r], Q[:,:r].T))\n\n    # u,v,a = svd(adj)\n\n    err = norm(adj-A2)/norm(adj)\n    print(err)\n\n\n# A = np.random.random((10000,10000))\n# A = np.triu(A) + np.triu(A).T\n# q = np.random.random((10000,))\n# K = 100\n# Q, sigma = lanczos(A,K,q)\n# r = 100\n# A2 = np.dot(Q[:,:r], np.dot(sigma[:r,:r], Q[:,:r].T))\n# err = norm(A-A2)/norm(A)\n# print(err)\n"
  },
  {
    "path": "layers.py",
    "content": "from inits import *\nimport tensorflow as tf\n\nflags = tf.app.flags\nFLAGS = flags.FLAGS\n\n# global unique layer ID dictionary for layer name assignment\n_LAYER_UIDS = {}\n\n\ndef get_layer_uid(layer_name=''):\n    \"\"\"Helper function, assigns unique layer IDs.\"\"\"\n    if layer_name not in _LAYER_UIDS:\n        _LAYER_UIDS[layer_name] = 1\n        return 1\n    else:\n        _LAYER_UIDS[layer_name] += 1\n        return _LAYER_UIDS[layer_name]\n\n\ndef sparse_dropout(x, keep_prob, noise_shape):\n    \"\"\"Dropout for sparse tensors.\"\"\"\n    random_tensor = keep_prob\n    random_tensor += tf.random_uniform(noise_shape)\n    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool)\n    pre_out = tf.sparse_retain(x, dropout_mask)\n    return pre_out * (1./keep_prob)\n\n\ndef dot(x, y, sparse=False):\n    \"\"\"Wrapper for tf.matmul (sparse vs dense).\"\"\"\n    if sparse:\n        res = tf.sparse_tensor_dense_matmul(x, y)\n    else:\n        res = tf.matmul(x, y)\n    return res\n\n\nclass Layer(object):\n    \"\"\"Base layer class. Defines basic API for all layer objects.\n    Implementation inspired by keras (http://keras.io).\n\n    # Properties\n        name: String, defines the variable scope of the layer.\n        logging: Boolean, switches Tensorflow histogram logging on/off\n\n    # Methods\n        _call(inputs): Defines computation graph of layer\n            (i.e. takes input, returns output)\n        __call__(inputs): Wrapper for _call()\n        _log_vars(): Log all variables\n    \"\"\"\n\n    def __init__(self, **kwargs):\n        allowed_kwargs = {'name', 'logging'}\n        for kwarg in kwargs.keys():\n            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg\n        name = kwargs.get('name')\n        if not name:\n            layer = self.__class__.__name__.lower()\n            name = layer + '_' + str(get_layer_uid(layer))\n        self.name = name\n        self.vars = {}\n        logging = kwargs.get('logging', False)\n        self.logging = logging\n        self.sparse_inputs = False\n\n    def _call(self, inputs):\n        return inputs\n\n    def __call__(self, inputs):\n        with tf.name_scope(self.name):\n            if self.logging and not self.sparse_inputs:\n                tf.summary.histogram(self.name + '/inputs', inputs)\n            outputs = self._call(inputs)\n            if self.logging:\n                tf.summary.histogram(self.name + '/outputs', outputs)\n            return outputs\n\n    def _log_vars(self):\n        for var in self.vars:\n            tf.summary.histogram(self.name + '/vars/' + var, self.vars[var])\n\n\nclass Dense(Layer):\n    \"\"\"Dense layer.\"\"\"\n    def __init__(self, input_dim, output_dim, placeholders, dropout=0., sparse_inputs=False,\n                 act=tf.nn.relu, bias=False, featureless=False, **kwargs):\n        super(Dense, self).__init__(**kwargs)\n\n        if dropout:\n            self.dropout = placeholders['dropout']\n        else:\n            self.dropout = 0.\n\n        self.act = act\n        self.sparse_inputs = sparse_inputs\n        self.featureless = featureless\n        self.bias = bias\n\n        # helper variable for sparse dropout\n        self.num_features_nonzero = placeholders['num_features_nonzero']\n\n        with tf.variable_scope(self.name + '_vars'):\n            self.vars['weights'] = glorot([input_dim, output_dim],\n                                          name='weights')\n            if self.bias:\n                self.vars['bias'] = zeros([output_dim], name='bias')\n\n        if self.logging:\n            self._log_vars()\n\n    def _call(self, inputs):\n        x = inputs\n\n        # dropout\n        if self.sparse_inputs:\n            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)\n        else:\n            x = tf.nn.dropout(x, 1-self.dropout)\n\n        # transform\n        output = dot(x, self.vars['weights'], sparse=self.sparse_inputs)\n\n        # bias\n        if self.bias:\n            output += self.vars['bias']\n\n        return self.act(output)\n\n\n\nclass GraphConvolution(Layer):\n    \"\"\"Graph convolution layer.\"\"\"\n    def __init__(self, input_dim, output_dim, placeholders, dropout=0.,\n                 support=None, sparse_inputs=False, act=tf.nn.relu, bias=False,\n                 featureless=False, **kwargs):\n        super(GraphConvolution, self).__init__(**kwargs)\n\n        if dropout:\n            self.dropout = placeholders['dropout']\n        else:\n            self.dropout = 0.\n\n        self.act = act\n        if support is None:\n            self.support = placeholders['support'][0]\n        else:\n            self.support = support\n        self.sparse_inputs = sparse_inputs\n        self.featureless = featureless\n        self.bias = bias\n\n        # helper variable for sparse dropout\n        self.num_features_nonzero = placeholders['num_features_nonzero']\n\n        with tf.variable_scope(self.name + '_vars'):\n            for i in range(1):\n                self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],\n                                                        name='weights_' + str(i))\n            if self.bias:\n                self.vars['bias'] = zeros([output_dim], name='bias')\n\n        if self.logging:\n            self._log_vars()\n\n    def _call(self, inputs):\n        x = inputs\n\n        # dropout\n        if self.sparse_inputs:\n            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)\n        else:\n            x = tf.nn.dropout(x, 1-self.dropout)\n\n        # convolve\n        # supports = list()\n        # for i in range(len(self.support)):\n        #     if not self.featureless:\n        #         pre_sup = dot(x, self.vars['weights_' + str(i)],\n        #                       sparse=self.sparse_inputs)\n        #     else:\n        #         pre_sup = self.vars['weights_' + str(i)]\n        #     support = dot(self.support[i], pre_sup, sparse=True)\n        #     supports.append(support)\n        # output = tf.add_n(supports)\n        if not self.featureless:\n            pre_sup = dot(x, self.vars['weights_0'],\n                          sparse=self.sparse_inputs)\n        else:\n            pre_sup = self.vars['weights_0']\n        output = dot(self.support, pre_sup, sparse=True)\n\n        # bias\n        if self.bias:\n            output += self.vars['bias']\n\n        return self.act(output)\n\n\nclass SampledGraphConvolution(Layer):\n    \"\"\"Graph convolution layer.\"\"\"\n    def __init__(self, input_dim, output_dim, placeholders, dropout=0., rank = 100,\n                 support=None, sparse_inputs=False, act=tf.nn.relu, bias=False,\n                 featureless=False, **kwargs):\n        super(SampledGraphConvolution, self).__init__(**kwargs)\n\n        if dropout:\n            self.dropout = placeholders['dropout']\n        else:\n            self.dropout = 0.\n\n        self.act = act\n        if support is None:\n            self.support = placeholders['support'][0]\n        else:\n            self.support = support\n        self.sparse_inputs = sparse_inputs\n        self.featureless = featureless\n        self.bias = bias\n\n        # helper variable for sparse dropout\n        self.num_features_nonzero = placeholders['num_features_nonzero']\n        self.rank = rank\n\n        with tf.variable_scope(self.name + '_vars'):\n            for i in range(1):\n                self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],\n                                                        name='weights_' + str(i))\n            if self.bias:\n                self.vars['bias'] = zeros([output_dim], name='bias')\n\n        if self.logging:\n            self._log_vars()\n\n    def _call(self, inputs):\n        x = inputs\n        norm_x = tf.nn.l2_normalize(x, axis=1)\n        norm_support = tf.nn.l2_normalize(self.support, axis=0)\n        norm_mix = tf.cross(norm_x, norm_support)\n        norm_mix = norm_mix*tf.inv(tf.reduce_sum(norm_mix))\n        sampledIndex = tf.multinomial(tf.log(norm_mix), self.rank)\n        new_support = dot(self.support,tf.diag(norm_mix),sparse=True)\n\n\n\n\n\n        # dropout\n        if self.sparse_inputs:\n            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)\n        else:\n            x = tf.nn.dropout(x, 1-self.dropout)\n\n        # convolve\n        # supports = list()\n        # for i in range(len(self.support)):\n        #     if not self.featureless:\n        #         pre_sup = dot(x, self.vars['weights_' + str(i)],\n        #                       sparse=self.sparse_inputs)\n        #     else:\n        #         pre_sup = self.vars['weights_' + str(i)]\n        #     support = dot(self.support[i], pre_sup, sparse=True)\n        #     supports.append(support)\n        # output = tf.add_n(supports)\n        if not self.featureless:\n            pre_sup = dot(x, self.vars['weights_0'],\n                          sparse=self.sparse_inputs)\n        else:\n            pre_sup = self.vars['weights_0']\n        output = dot(new_support, pre_sup, sparse=True)\n\n        # bias\n        if self.bias:\n            output += self.vars['bias']\n\n        return self.act(output)\n\n"
  },
  {
    "path": "metrics.py",
    "content": "import tensorflow as tf\n\n\n\ndef masked_softmax_cross_entropy(preds, labels, mask):\n    \"\"\"Softmax cross-entropy loss with masking.\"\"\"\n    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)\n    mask = tf.cast(mask, dtype=tf.float32)\n    mask /= tf.reduce_mean(mask)\n    loss *= mask\n    return tf.reduce_mean(loss)\n\n\ndef masked_accuracy(preds, labels, mask):\n    \"\"\"Accuracy with masking.\"\"\"\n    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(labels, 1))\n    accuracy_all = tf.cast(correct_prediction, tf.float32)\n    mask = tf.cast(mask, dtype=tf.float32)\n    mask /= tf.reduce_mean(mask)\n    accuracy_all *= mask\n    return tf.reduce_mean(accuracy_all)\n\n\ndef softmax_cross_entropy(preds, labels):\n    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)\n    return tf.reduce_mean(loss)\n\ndef accuracy(preds, labels):\n    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(labels, 1))\n    accuracy_all = tf.cast(correct_prediction, tf.float32)\n    return tf.reduce_mean(accuracy_all)\n\n\n"
  },
  {
    "path": "models.py",
    "content": "from layers import *\nfrom metrics import *\n\nflags = tf.app.flags\nFLAGS = flags.FLAGS\n\n\nclass Model(object):\n    def __init__(self, **kwargs):\n        allowed_kwargs = {'name', 'logging'}\n        for kwarg in kwargs.keys():\n            assert kwarg in allowed_kwargs, 'Invalid keyword argument: ' + kwarg\n        name = kwargs.get('name')\n        if not name:\n            name = self.__class__.__name__.lower()\n        self.name = name\n\n        logging = kwargs.get('logging', False)\n        self.logging = logging\n\n        self.vars = {}\n        self.placeholders = {}\n\n        self.layers = []\n        self.activations = []\n\n        self.inputs = None\n        self.outputs = None\n\n        self.loss = 0\n        self.accuracy = 0\n        self.optimizer = None\n        self.opt_op = None\n\n    def _build(self):\n        raise NotImplementedError\n\n    def build(self):\n        \"\"\" Wrapper for _build() \"\"\"\n        with tf.variable_scope(self.name):\n            self._build()\n\n        # Build sequential layer model\n        self.activations.append(self.inputs)\n        for layer in self.layers:\n            hidden = layer(self.activations[-1])\n            self.activations.append(hidden)\n        self.outputs = self.activations[-1]\n\n        # Store model variables for easy access\n        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)\n        self.vars = {var.name: var for var in variables}\n\n        # Build metrics\n        self._loss()\n        self._accuracy()\n\n        self.opt_op = self.optimizer.minimize(self.loss)\n\n    def predict(self):\n        pass\n\n    def _loss(self):\n        raise NotImplementedError\n\n    def _accuracy(self):\n        raise NotImplementedError\n\n    def save(self, sess=None):\n        if not sess:\n            raise AttributeError(\"TensorFlow session not provided.\")\n        saver = tf.train.Saver(self.vars)\n        save_path = saver.save(sess, \"tmp/%s.ckpt\" % self.name)\n        print(\"Model saved in file: %s\" % save_path)\n\n    def load(self, sess=None):\n        if not sess:\n            raise AttributeError(\"TensorFlow session not provided.\")\n        saver = tf.train.Saver(self.vars)\n        save_path = \"tmp/%s.ckpt\" % self.name\n        saver.restore(sess, save_path)\n        print(\"Model restored from file: %s\" % save_path)\n\n\nclass MLP(Model):\n    def __init__(self, placeholders, input_dim, **kwargs):\n        super(MLP, self).__init__(**kwargs)\n\n        self.inputs = placeholders['features']\n        self.input_dim = input_dim\n        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions\n        self.output_dim = placeholders['labels'].get_shape().as_list()[1]\n        self.placeholders = placeholders\n\n        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)\n\n        self.build()\n\n    def _loss(self):\n        # Weight decay loss\n        for var in self.layers[0].vars.values():\n            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)\n\n        # Cross entropy error\n        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],\n                                                  self.placeholders['labels_mask'])\n\n    def _accuracy(self):\n        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],\n                                        self.placeholders['labels_mask'])\n\n    def _build(self):\n        self.layers.append(Dense(input_dim=self.input_dim,\n                                 output_dim=FLAGS.hidden1,\n                                 placeholders=self.placeholders,\n                                 act=tf.nn.relu,\n                                 dropout=True,\n                                 sparse_inputs=True,\n                                 logging=self.logging))\n\n        self.layers.append(Dense(input_dim=FLAGS.hidden1,\n                                 output_dim=self.output_dim,\n                                 placeholders=self.placeholders,\n                                 act=lambda x: x,\n                                 dropout=True,\n                                 logging=self.logging))\n\n    def predict(self):\n        return tf.nn.softmax(self.outputs)\n\n\n\n\nclass GCN(Model):\n    def __init__(self, placeholders, input_dim, **kwargs):\n        super(GCN, self).__init__(**kwargs)\n\n        self.inputs = placeholders['features']\n        self.input_dim = input_dim\n        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions\n        self.output_dim = placeholders['labels'].get_shape().as_list()[1]\n        self.placeholders = placeholders\n\n        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)\n\n        self.build()\n\n    def _loss(self):\n        # Weight decay loss\n        for var in self.layers[0].vars.values():\n            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)\n\n        # Cross entropy error\n        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],\n                                                  self.placeholders['labels_mask'])\n\n    def _accuracy(self):\n        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],\n                                        self.placeholders['labels_mask'])\n\n    def _build(self):\n\n        self.layers.append(GraphConvolution(input_dim=self.input_dim,\n                                            output_dim=FLAGS.hidden1,\n                                            placeholders=self.placeholders,\n                                            act=tf.nn.relu,\n                                            dropout=True,\n                                            sparse_inputs=True,\n                                            logging=self.logging))\n\n        self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,\n                                            output_dim=self.output_dim,\n                                            placeholders=self.placeholders,\n                                            act=lambda x: x,\n                                            dropout=True,\n                                            logging=self.logging))\n\n    def predict(self):\n        return tf.nn.softmax(self.outputs)\n\n\n\nclass GCN_APPRO(Model):\n    def __init__(self, placeholders, input_dim, **kwargs):\n        super(GCN_APPRO, self).__init__(**kwargs)\n        self.inputs = placeholders['features']\n        self.input_dim = input_dim\n        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions\n        self.output_dim = placeholders['labels'].get_shape().as_list()[1]\n        self.placeholders = placeholders\n        self.supports = placeholders['support']\n\n        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)\n\n        self.build()\n\n    def _loss(self):\n        # Weight decay loss\n        for var in self.layers[0].vars.values():\n            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)\n\n        # Cross entropy error\n        self.loss += softmax_cross_entropy(self.outputs, self.placeholders['labels'])\n\n    def _accuracy(self):\n        self.accuracy = accuracy(self.outputs, self.placeholders['labels'])\n\n    def _build(self):\n        # appr_support = self.placeholders['support'][0]\n        self.layers.append(GraphConvolution(input_dim=self.input_dim,\n                                            output_dim=FLAGS.hidden1,\n                                            placeholders=self.placeholders,\n                                            support=self.supports[0],\n                                            act=tf.nn.relu,\n                                            dropout=True,\n                                            sparse_inputs=False,\n                                            logging=self.logging))\n\n        self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,\n                                            output_dim=self.output_dim,\n                                            placeholders=self.placeholders,\n                                            support=self.supports[1],\n                                            act=lambda x: x,\n                                            dropout=True,\n                                            logging=self.logging))\n\n    def predict(self):\n        return tf.nn.softmax(self.outputs)\n\n\nclass GCN_APPRO_Mix(Model): #mixture of dense and gcn\n    def __init__(self, placeholders, input_dim, **kwargs):\n        super(GCN_APPRO_Mix, self).__init__(**kwargs)\n        self.inputs = placeholders['AXfeatures']# A*X for the bottom layer, not original feature X\n        self.input_dim = input_dim\n        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions\n        self.output_dim = placeholders['labels'].get_shape().as_list()[1]\n        self.placeholders = placeholders\n        self.support = placeholders['support']\n\n        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)\n\n        self.build()\n\n    def _loss(self):\n        # Weight decay loss\n        for var in self.layers[0].vars.values():\n            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)\n\n        # Cross entropy error\n        self.loss += softmax_cross_entropy(self.outputs, self.placeholders['labels'])\n\n\n    def _accuracy(self):\n        self.accuracy = accuracy(self.outputs, self.placeholders['labels'])\n\n    def _build(self):\n        self.layers.append(Dense(input_dim=self.input_dim,\n                                 output_dim=FLAGS.hidden1,\n                                 placeholders=self.placeholders,\n                                 act=tf.nn.relu,\n                                 dropout=True,\n                                 sparse_inputs=False,\n                                 logging=self.logging))\n\n        self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,\n                                            output_dim=self.output_dim,\n                                            placeholders=self.placeholders,\n                                            support=self.support,\n                                            act=lambda x: x,\n                                            dropout=True,\n                                            logging=self.logging))\n\n    def predict(self):\n        return tf.nn.softmax(self.outputs)\n\n\n\nclass GCN_APPRO_Onelayer(Model):\n    def __init__(self, placeholders, input_dim, **kwargs):\n        super(GCN_APPRO_Onelayer, self).__init__(**kwargs)\n        self.inputs = placeholders['features']\n        self.input_dim = input_dim\n        # self.input_dim = self.inputs.get_shape().as_list()[1]  # To be supported in future Tensorflow versions\n        self.output_dim = placeholders['labels'].get_shape().as_list()[1]\n        self.placeholders = placeholders\n        self.supports = placeholders['support']\n\n        self.optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)\n\n        self.build()\n\n    def _loss(self):\n        # Weight decay loss\n        for var in self.layers[0].vars.values():\n            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)\n\n        # Cross entropy error\n        self.loss += masked_softmax_cross_entropy(self.outputs, self.placeholders['labels'],\n                                                  self.placeholders['labels_mask'])\n\n    def _accuracy(self):\n        self.accuracy = masked_accuracy(self.outputs, self.placeholders['labels'],\n                                        self.placeholders['labels_mask'])\n\n    def _build(self):\n        appr_support = self.placeholders['support'][0]\n        self.layers.append(GraphConvolution(input_dim=self.input_dim,\n                                            output_dim=self.output_dim,\n                                            placeholders=self.placeholders,\n                                            support=self.supports[0],\n                                            act=tf.nn.relu,\n                                            dropout=True,\n                                            sparse_inputs=True,\n                                            logging=self.logging))\n\n    def predict(self):\n        return tf.nn.softmax(self.outputs)\n"
  },
  {
    "path": "pubmed-original_inductive_FastGCN.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\nimport os\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn_mix', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 100, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 10, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef main(rank1):\n\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data_original(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    y_test = y_test[test_index]\n\n    train_val_index = np.concatenate([train_index, val_index],axis=0)\n    train_test_idnex = np.concatenate([train_index, test_index],axis=0)\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        # normADJ = nontuple_preprocess_adj(adj)\n        normADJ_val = nontuple_preprocess_adj(adj[train_val_index,:][:,train_val_index])\n        normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:,train_test_idnex])\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    val_features = normADJ_val.dot(features[train_val_index])\n    test_features = normADJ_test.dot(features[train_test_idnex])\n\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ_val[len(train_index):, :])\n    testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=20, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1, replace=False, p=p0[distr]/sum(p0[distr]))  # top layer\n\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n                if len(support1[1])==0:\n                    continue\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n        # Validation\n        cost, acc, duration = evaluate(val_features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        # if epoch > 50 and acc>maxACC:\n        #     maxACC = acc\n        #     save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch%5==0:\n            # Validation\n            test_cost, test_acc, test_duration = evaluate(test_features, testSupport, y_test,\n                                                          placeholders)\n            print(\"training time by far=\", \"{:.5f}\".format(time.time() - t),\n                  \"epoch = {}\".format(epoch + 1),\n                  \"cost=\", \"{:.5f}\".format(test_cost),\n                  \"accuracy=\", \"{:.5f}\".format(test_acc))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    # if os.path.exists(\"tmp/pubmed_MixModel.ckpt\"):\n    #     saver.restore(sess, \"tmp/pubmed_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(test_features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration), \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    # main(None)\n    main(100)\n    # for k in [5, 10, 25, 50]:\n    #     main(k)"
  },
  {
    "path": "pubmed-original_transductive_FastGCN.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\nimport os\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn_mix', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 100, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 10, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef main(rank1):\n\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data_original(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    y_test = y_test[test_index]\n\n    train_val_index = np.concatenate([train_index, val_index],axis=0)\n    train_test_index = np.concatenate([train_index, test_index],axis=0)\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n\n    if FLAGS.model == 'gcn_mix':\n\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_train = nontuple_preprocess_adj(adj_train)\n        # normADJ_val = nontuple_preprocess_adj(adj[train_val_index,:][:])\n        # normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:])\n        normADJ_train = normADJ[train_index,:][:]\n        normADJ_val = normADJ[train_val_index, :][:]\n        normADJ_test = normADJ[train_test_index, :][:]\n\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    ax_features = normADJ.dot(features[:])\n    # val_features = normADJ_val.dot(features[train_val_index])\n    # test_features = normADJ_test.dot(features[train_test_idnex])\n\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(ax_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'labels_mask': tf.placeholder(tf.int32),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ_val[len(train_index):, :])\n    testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=20, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = ax_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1, replace=False, p=p0[distr]/sum(p0[distr]))  # top layer\n\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n                if len(support1[1])==0:\n                    continue\n                features_inputs = ax_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n        # Validation\n        cost, acc, duration = evaluate(ax_features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        # if epoch > 50 and acc>maxACC:\n        #     maxACC = acc\n        #     save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        # if epoch%5==0:\n        #     # Validation\n        #     test_cost, test_acc, test_duration = evaluate(ax_features, testSupport, y_test,\n        #                                                   placeholders)\n        #     print(\"training time by far=\", \"{:.5f}\".format(time.time() - t),\n        #           \"epoch = {}\".format(epoch + 1),\n        #           \"cost=\", \"{:.5f}\".format(test_cost),\n        #           \"accuracy=\", \"{:.5f}\".format(test_acc))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    # if os.path.exists(\"tmp/pubmed_MixModel.ckpt\"):\n    #     saver.restore(sess, \"tmp/pubmed_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(ax_features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration), \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    main(400)\n    # main(100)\n    # for k in [5, 10, 25, 50]:\n    #     main(k)"
  },
  {
    "path": "pubmed_Mix.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\nimport os\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn_mix', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef main(rank1):\n\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    y_test = y_test[test_index]\n\n    train_val_index = np.concatenate([train_index, val_index],axis=0)\n    train_test_idnex = np.concatenate([train_index, test_index],axis=0)\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        # normADJ = nontuple_preprocess_adj(adj)\n\n\n        normADJ_val = nontuple_preprocess_adj(adj[train_val_index,:][:,train_val_index])\n        normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:,train_test_idnex])\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    val_features = normADJ_val.dot(features[train_val_index])\n    test_features = normADJ_test.dot(features[train_test_idnex])\n\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ_val[len(train_index):, :])\n    testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=1024, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            p1 = column_prop(normADJ_batch)\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n\n                q1 = np.random.choice(np.arange(numNode_train), rank1, replace=False, p=p1)  # top layer\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n\n        # Validation\n        cost, acc, duration = evaluate(val_features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        # if epoch > 50 and acc>maxACC:\n        #     maxACC = acc\n        #     save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        # print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n        #       \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n        #       \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    # if os.path.exists(\"tmp/pubmed_MixModel.ckpt\"):\n    #     saver.restore(sess, \"tmp/pubmed_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(test_features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    for k in [25, 50, 100, 200, 400]:\n        main(k)"
  },
  {
    "path": "pubmed_Mix_sampleA.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\nimport os\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn_mix', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef main(rank1):\n\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    y_test = y_test[test_index]\n\n    train_val_index = np.concatenate([train_index, val_index],axis=0)\n    train_test_idnex = np.concatenate([train_index, test_index],axis=0)\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        # normADJ = nontuple_preprocess_adj(adj)\n        normADJ_val = nontuple_preprocess_adj(adj[train_val_index,:][:,train_val_index])\n        normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:,train_test_idnex])\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    val_features = normADJ_val.dot(features[train_val_index])\n    test_features = normADJ_test.dot(features[train_test_idnex])\n\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ_val[len(train_index):, :])\n    testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=1024, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1, replace=False, p=p0[distr]/sum(p0[distr]))  # top layer\n\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n                if len(support1[1])==0:\n                    continue\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n\n        # Validation\n        cost, acc, duration = evaluate(val_features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        # if epoch > 50 and acc>maxACC:\n        #     maxACC = acc\n        #     save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        # print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n        #       \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n        #       \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    # if os.path.exists(\"tmp/pubmed_MixModel.ckpt\"):\n    #     saver.restore(sess, \"tmp/pubmed_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(test_features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    # main(None)\n    main(50)\n    # for k in [25, 50, 100, 200, 400]:\n    #     main(k)"
  },
  {
    "path": "pubmed_Mix_uniform.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\nimport os\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn_mix', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef main(rank1):\n\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    y_test = y_test[test_index]\n\n    train_val_index = np.concatenate([train_index, val_index],axis=0)\n    train_test_idnex = np.concatenate([train_index, test_index],axis=0)\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        # normADJ = nontuple_preprocess_adj(adj)\n\n\n        normADJ_val = nontuple_preprocess_adj(adj[train_val_index,:][:,train_val_index])\n        normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:,train_test_idnex])\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    val_features = normADJ_val.dot(features[train_val_index])\n    test_features = normADJ_test.dot(features[train_test_idnex])\n\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ_val[len(train_index):, :])\n    testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=1024, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            p1 = column_prop(normADJ_batch)\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1, replace=False)  # top layer\n                # q1 = np.random.choice(np.arange(numNode_train), rank1)  # top layer\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1] * numNode_train / len(q1))\n\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n\n        # Validation\n        cost, acc, duration = evaluate(val_features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        # if epoch > 50 and acc>maxACC:\n        #     maxACC = acc\n        #     save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        # print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n        #       \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n        #       \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    # if os.path.exists(\"tmp/pubmed_MixModel.ckpt\"):\n    #     saver.restore(sess, \"tmp/pubmed_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(test_features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    main(5)\n    # for k in [25, 50, 100, 200, 400]:\n    #     main(k)"
  },
  {
    "path": "pubmed_inductive_appr2layers.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN, MLP, GCN_APPRO\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_appr', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\ndef main(rank1, rank0):\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    # adj_val = adj[val_index, :][:, val_index]\n    val_mask = val_mask[val_index]\n    y_val = y_val[val_index]\n    test_index = np.where(test_mask)[0]\n    # adj_test = adj[test_index, :][:, test_index]\n    test_mask = test_mask[test_index]\n    y_test = y_test[test_index]\n\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n    train_features = features[train_index]\n\n    if FLAGS.model == 'gcn_appr':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Define placeholders\n    placeholders = {\n        'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],\n        'features': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'labels_mask': tf.placeholder(tf.int32),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, mask, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[val_index, :])]\n    testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[test_index, :])]\n\n    t = time.time()\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=256, shuffle=True):\n            [normADJ_batch, y_train_batch, train_mask_batch] = batch\n            if sum(train_mask_batch) < 1:\n                continue\n            p1 = column_prop(normADJ_batch)\n            q1 = np.random.choice(np.arange(numNode_train), rank1, p=p1)  # top layer\n            # q0 = np.random.choice(np.arange(numNode_train), rank0, p=p0)  # bottom layer\n            support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n\n            p2 = column_prop(normADJ_train[q1, :])\n            q0 = np.random.choice(np.arange(numNode_train), rank0, p=p2)\n            support0 = sparse_to_tuple(normADJ_train[q1, :][:, q0])\n            features_inputs = sp.diags(1.0 / (p2[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation\n\n\n            # Construct feed dictionary\n            feed_dict = construct_feed_dict(features_inputs, [support0, support1], y_train_batch, train_mask_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n\n        # Validation\n        cost, acc, duration = evaluate(features, valSupport, y_val, val_mask, placeholders)\n        cost_val.append(cost)\n\n        # # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time=\", \"{:.5f}\".format(time.time() - t1))\n\n        if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test, test_mask,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"rank0 = {}\".format(rank0), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time per epoch=\", \"{:.5f}\".format(train_duration/epoch))\n\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    for k in [5, 10, 25, 50]:\n        main(k, k)\n\n    # main(50,50)\n    # for k in [50, 100, 200, 400]:\n    #     main(k, k)"
  },
  {
    "path": "train.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\n\nfrom utils import *\nfrom models import GCN, MLP\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn', 'Model string.')  # 'gcn', 'gcn_cheby', 'dense'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.5, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 10, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n# Load data\nadj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n# Some preprocessing\nfeatures = preprocess_features(features)\nif FLAGS.model == 'gcn':\n    support = [preprocess_adj(adj)]\n    num_supports = 1\n    model_func = GCN\nelif FLAGS.model == 'gcn_cheby':\n    support = chebyshev_polynomials(adj, FLAGS.max_degree)\n    num_supports = 1 + FLAGS.max_degree\n    model_func = GCN\nelif FLAGS.model == 'dense':\n    support = [preprocess_adj(adj)]  # Not used\n    num_supports = 1\n    model_func = MLP\nelse:\n    raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n# Define placeholders\nplaceholders = {\n    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],\n    'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64)),\n    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n    'labels_mask': tf.placeholder(tf.int32),\n    'dropout': tf.placeholder_with_default(0., shape=()),\n    'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n}\n\n# Create model\nmodel = model_func(placeholders, input_dim=features[2][1], logging=True)\nprint(adj.shape[0])\n\n# Initialize session\nsess = tf.Session()\n\n\n# Define model evaluation function\ndef evaluate(features, support, labels, mask, placeholders):\n    t_test = time.time()\n    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)\n    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n    return outs_val[0], outs_val[1], (time.time() - t_test)\n\n\n# Init variables\nsess.run(tf.global_variables_initializer())\n\ncost_val = []\nt_start = time.time()\n# Train model\nfor epoch in range(FLAGS.epochs):\n\n    t = time.time()\n    # Construct feed dictionary\n    feed_dict = construct_feed_dict(features, support, y_train, train_mask, placeholders)\n    feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n    # Training step\n    outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n\n    # Validation\n    cost, acc, duration = evaluate(features, support, y_val, val_mask, placeholders)\n    cost_val.append(cost)\n\n    # Print results\n    print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n          \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n          \"val_acc=\", \"{:.5f}\".format(acc), \"time=\", \"{:.5f}\".format(time.time() - t))\n\n    # if epoch % 5 == 0:\n    #     # Validation\n    #     test_cost, test_acc, test_duration = evaluate(features, support, y_test, test_mask, placeholders)\n    #     print(\"training time by far=\", \"{:.5f}\".format(time.time() - t_start),\n    #           \"epoch = {}\".format(epoch + 1),\n    #           \"cost=\", \"{:.5f}\".format(test_cost),\n    #           \"accuracy=\", \"{:.5f}\".format(test_acc))\n\n    if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping+1):-1]):\n        print(\"Early stopping...\")\n        break\n\n# print(\"Optimization Finished!\")\ntrain_duration = time.time()-t_start\n# Testing\ntest_cost, test_acc, test_duration = evaluate(features, support, y_test, test_mask, placeholders)\nprint(\"Original test set results:\", \"cost=\", \"{:.5f}\".format(test_cost),\n      \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time =\", \"{:.5f}\".format(train_duration),\n      \"training time per epoch=\", \"{:.5f}\".format(train_duration/(epoch+1)),\n      \"test time=\", \"{:.5f}\".format(test_duration))\n"
  },
  {
    "path": "train_batch_multiRank_inductive_newscheme.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN, MLP, GCN_APPRO\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_appr', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 300, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.5, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\nrank1 = 300\nrank0 = 300\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\ndef main(rank1, rank0):\n    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    train_index = np.where(train_mask)[0]\n    adj_train = adj[train_index, :][:, train_index]\n    train_mask = train_mask[train_index]\n    y_train = y_train[train_index]\n    val_index = np.where(val_mask)[0]\n    # adj_val = adj[val_index, :][:, val_index]\n    # val_mask = val_mask[val_index]\n    # y_val = y_val[val_index]\n    # test_index = np.where(test_mask)[0]\n    # adj_test = adj[test_index, :][:, test_index]\n    # test_mask = test_mask[test_index]\n    # y_test = y_test[test_index]\n\n    numNode_train = adj_train.shape[0]\n    # print(\"numNode\", numNode)\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n    train_features = features[train_index]\n    if FLAGS.model == 'gcn_appr':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Define placeholders\n    placeholders = {\n        'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],\n        'features': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'labels_mask': tf.placeholder(tf.int32),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, mask, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n    p1 = mix_prop(normADJ_train, features[train_index, :])\n\n    testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    # valSupport = [sparse_to_tuple(normADJ_val), sparse_to_tuple(normADJ_val)]\n    # testSupport = [sparse_to_tuple(normADJ_test), sparse_to_tuple(normADJ_test)]\n    t = time.time()\n    # Train model\n    for epoch in range(FLAGS.epochs):\n\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=50, shuffle=True):\n            [normADJ_batch, y_train_batch, train_mask_batch] = batch\n            if sum(train_mask_batch) < 1:\n                continue\n            # p1 = column_prop(normADJ_batch)\n            q1 = np.random.choice(np.arange(numNode_train), rank1, p=p0)  # top layer\n            q0 = np.random.choice(np.arange(numNode_train), rank0, p=p0)  # bottom layer\n            support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n            support0 = sparse_to_tuple(normADJ_train[q1, :][:, q0])\n            # support1 = sparse_to_tuple(normADJ_batch)\n            # support0 = sparse_to_tuple(normADJ[:, q0])\n            features_inputs = sp.diags(1.0 / (p1[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feed_dict(features_inputs, [support0, support1], y_train_batch, train_mask_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n\n        # Validation\n        cost, acc, duration = evaluate(features, testSupport, y_val, val_mask, placeholders)\n        cost_val.append(cost)\n\n        # # Print results\n        # print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n        #       \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n        #       \"val_acc=\", \"{:.5f}\".format(acc), \"time=\", \"{:.5f}\".format(time.time() - t))\n\n        if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test, test_mask,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"rank0 = {}\".format(rank0), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time per epoch=\", \"{:.5f}\".format(train_duration/epoch))\n\n\nif __name__==\"__main__\":\n    print(\"DATASET:\", FLAGS.dataset)\n    for k in range(100, 1000, 200):\n        main(k, k)"
  },
  {
    "path": "train_batch_multiRank_inductive_reddit_Mixlayers_sampleA.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\nimport json\nfrom networkx.readwrite import json_graph\nimport os\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 1e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    # train_feats = feats[[feat_id_map[id] for id in train_ids]]\n    # test_feats = feats[[feat_id_map[id] for id in test_ids]]\n\n    # numNode = len(feat_id_map)\n    # adj = sp.lil_matrix(np.zeros((numNode,numNode)))\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats = feats, y_train=train_labels, y_val=val_labels, y_test = test_labels, train_index = train_index,\n             val_index=val_index, test_index = test_index)\n\n\ndef transferLabel2Onehot(labels, N):\n    y = np.zeros((len(labels),N))\n    for i in range(len(labels)):\n        pos = labels[i]\n        y[i,pos] =1\n    return y\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef main(rank1):\n\n\n\n    # config = tf.ConfigProto(device_count={\"CPU\": 4}, # limit to num_cpu_core CPU usage\n    #                 inter_op_parallelism_threads = 1,\n    #                 intra_op_parallelism_threads = 4,\n    #                 log_device_placement=False)\n    adj, features, y_train, y_val, y_test,train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj+adj.T\n\n\n    y_train = transferLabel2Onehot(y_train, 41)\n    y_val = transferLabel2Onehot(y_val, 41)\n    y_test = transferLabel2Onehot(y_test, 41)\n\n    features = sp.lil_matrix(features)\n\n    adj_train = adj[train_index, :][:, train_index]\n\n\n    numNode_train = adj_train.shape[0]\n\n\n    # print(\"numNode\", numNode)\n\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    features = normADJ.dot(features)\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ[val_index, :])\n    testSupport = sparse_to_tuple(normADJ[test_index, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=256, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            # p1 = column_prop(normADJ_batch)\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1, replace=False, p=p0[distr]/sum(p0[distr]))  # top layer\n\n                # q1 = np.random.choice(np.arange(numNode_train), rank1, p=p0)  # top layer\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n                if len(support1[1])==0:\n                    continue\n\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n+1\n\n\n        # Validation\n        cost, acc, duration = evaluate(features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        if epoch > 20 and acc>maxACC:\n            maxACC = acc\n            saver.save(sess, \"tmp/tmp_MixModel_sampleA_full.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch%5==0:\n            # Validation\n            test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test,\n                                                          placeholders)\n            print(\"training time by far=\", \"{:.5f}\".format(time.time() - t),\n                  \"epoch = {}\".format(epoch + 1),\n                  \"cost=\", \"{:.5f}\".format(test_cost),\n                  \"accuracy=\", \"{:.5f}\".format(test_acc))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    if os.path.exists(\"tmp/tmp_MixModel_sampleA_full.ckpt.index\"):\n        saver.restore(sess, \"tmp/tmp_MixModel_sampleA_full.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration),\n          \"epoch = {}\".format(epoch+1),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\ndef transferG2ADJ():\n    G = json_graph.node_link_graph(json.load(open(\"reddit/reddit-G.json\")))\n    feat_id_map = json.load(open(\"reddit/reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n    numNode = len(feat_id_map)\n    adj = np.zeros((numNode, numNode))\n    newEdges0 = [feat_id_map[edge[0]] for edge in G.edges()]\n    newEdges1 = [feat_id_map[edge[1]] for edge in G.edges()]\n\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n    adj = sp.csr_matrix((np.ones((len(newEdges0),)), (newEdges0, newEdges1)), shape=(numNode, numNode))\n    sp.save_npz(\"reddit_adj.npz\", adj)\n\n\ndef test(rank1=None):\n    # config = tf.ConfigProto(device_count={\"CPU\": 4}, # limit to num_cpu_core CPU usage\n    #                 inter_op_parallelism_threads = 1,\n    #                 intra_op_parallelism_threads = 4,\n    #                 log_device_placement=False)\n    adj, features, y_train, y_val, y_test, train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj + adj.T\n\n    y_train = transferLabel2Onehot(y_train, 41)\n    y_test = transferLabel2Onehot(y_test, 41)\n\n    features = sp.lil_matrix(features)\n\n\n    numNode_train = y_train.shape[0]\n\n    # print(\"numNode\", numNode)\n\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ = nontuple_preprocess_adj(adj)\n        normADJ_test = normADJ[test_index, :]\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    features = normADJ.dot(features)\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32),\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    saver.restore(sess, \"tmp/tmp_MixModel_sampleA.ckpt\")\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_test)\n\n\n    t = time.time()\n\n    if rank1 is None:\n        support1 = sparse_to_tuple(normADJ_test)\n        features_inputs = features\n    else:\n        distr = np.nonzero(np.sum(normADJ_test, axis=0))[1]\n        if rank1 > len(distr):\n            q1 = distr\n        else:\n            q1 = np.random.choice(distr, rank1, replace=False, p=p0[distr] / sum(p0[distr]))  # top layer\n\n        # q1 = np.random.choice(np.arange(numNode_train), rank1, p=p0)  # top layer\n\n        support1 = sparse_to_tuple(normADJ_test[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n\n\n        features_inputs = features[q1, :]  # selected nodes for approximation\n\n    test_cost, test_acc, test_duration = evaluate(features_inputs, support1, y_test,\n                                                  placeholders)\n\n\n    test_duration = time.time() - t\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    # main(None)\n    main(None)\n    # for k in [25, 50, 100, 200, 400]:\n    #     main(k)"
  },
  {
    "path": "train_batch_multiRank_inductive_reddit_Mixlayers_sampleBatch.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN_APPRO_Mix\nimport json\nfrom networkx.readwrite import json_graph\nimport os\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\n\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 1e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 100, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    # train_feats = feats[[feat_id_map[id] for id in train_ids]]\n    # test_feats = feats[[feat_id_map[id] for id in test_ids]]\n\n    # numNode = len(feat_id_map)\n    # adj = sp.lil_matrix(np.zeros((numNode,numNode)))\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats = feats, y_train=train_labels, y_val=val_labels, y_test = test_labels, train_index = train_index,\n             val_index=val_index, test_index = test_index)\n\n\ndef transferLabel2Onehot(labels, N):\n    y = np.zeros((len(labels),N))\n    for i in range(len(labels)):\n        pos = labels[i]\n        y[i,pos] =1\n    return y\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef main(rank1):\n\n\n\n    # config = tf.ConfigProto(device_count={\"CPU\": 4}, # limit to num_cpu_core CPU usage\n    #                 inter_op_parallelism_threads = 1,\n    #                 intra_op_parallelism_threads = 4,\n    #                 log_device_placement=False)\n    adj, features, y_train, y_val, y_test,train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj+adj.T\n\n\n    y_train = transferLabel2Onehot(y_train, 41)\n    y_val = transferLabel2Onehot(y_val, 41)\n    y_test = transferLabel2Onehot(y_test, 41)\n\n    features = sp.lil_matrix(features)\n\n    adj_train = adj[train_index, :][:, train_index]\n\n\n    numNode_train = adj_train.shape[0]\n\n\n    # print(\"numNode\", numNode)\n\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    features = normADJ.dot(features)\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ[val_index, :])\n    testSupport = sparse_to_tuple(normADJ[test_index, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=256, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            p1 = column_prop(normADJ_batch)\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n\n                q1 = np.random.choice(np.arange(numNode_train), rank1, replace=False, p=p1)  # top layer\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n +1\n\n\n        # Validation\n        cost, acc, duration = evaluate(features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        if epoch > 50 and acc>maxACC:\n            maxACC = acc\n            save_path = saver.save(sess, \"tmp/tmp_MixModel.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n\n    train_duration = time.time() - t\n    # Testing\n    if os.path.exists(\"tmp/tmp_MixModel.ckpt\"):\n        saver.restore(sess, \"tmp/tmp_MixModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration), \"epoch = {}\".format(epoch+1),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\nif __name__==\"__main__\":\n    # main(100)\n    for k in [25, 50]:\n        main(k)"
  },
  {
    "path": "train_batch_multiRank_inductive_reddit_Mixlayers_uniform.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN, MLP, GCN_APPRO_Mix\nimport json\nfrom networkx.readwrite import json_graph\nimport os\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_mix', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 200, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 1e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 100, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    # train_feats = feats[[feat_id_map[id] for id in train_ids]]\n    # test_feats = feats[[feat_id_map[id] for id in test_ids]]\n\n    # numNode = len(feat_id_map)\n    # adj = sp.lil_matrix(np.zeros((numNode,numNode)))\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats = feats, y_train=train_labels, y_val=val_labels, y_test = test_labels, train_index = train_index,\n             val_index=val_index, test_index = test_index)\n\n\ndef transferLabel2Onehot(labels, N):\n    y = np.zeros((len(labels),N))\n    for i in range(len(labels)):\n        pos = labels[i]\n        y[i,pos] =1\n    return y\n\ndef construct_feeddict_forMixlayers(AXfeatures, support, labels, placeholders):\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['AXfeatures']: AXfeatures})\n    feed_dict.update({placeholders['support']: support})\n    feed_dict.update({placeholders['num_features_nonzero']: AXfeatures[1].shape})\n    return feed_dict\n\ndef main(rank1):\n\n\n\n    # config = tf.ConfigProto(device_count={\"CPU\": 4}, # limit to num_cpu_core CPU usage\n    #                 inter_op_parallelism_threads = 1,\n    #                 intra_op_parallelism_threads = 4,\n    #                 log_device_placement=False)\n    adj, features, y_train, y_val, y_test,train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj+adj.T\n\n    y_train = transferLabel2Onehot(y_train, 41)\n    y_val = transferLabel2Onehot(y_val, 41)\n    y_test = transferLabel2Onehot(y_test, 41)\n\n    features = sp.lil_matrix(features)\n\n    adj_train = adj[train_index, :][:, train_index]\n\n\n    numNode_train = adj_train.shape[0]\n\n\n    # print(\"numNode\", numNode)\n\n\n\n    if FLAGS.model == 'gcn_mix':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        model_func = GCN_APPRO_Mix\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n\n    train_features = normADJ_train.dot(features[train_index])\n    features = normADJ.dot(features)\n    nonzero_feature_number = len(np.nonzero(features)[0])\n    nonzero_feature_number_train = len(np.nonzero(train_features)[0])\n\n\n    # Define placeholders\n    placeholders = {\n        'support': tf.sparse_placeholder(tf.float32) ,\n        'AXfeatures': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n    saver = tf.train.Saver()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feeddict_forMixlayers(features, support, labels, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n\n    cost_val = []\n\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = sparse_to_tuple(normADJ[val_index, :])\n    testSupport = sparse_to_tuple(normADJ[test_index, :])\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train], batchsize=256, shuffle=True):\n            [normADJ_batch, y_train_batch] = batch\n\n            if rank1 is None:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = train_features\n            else:\n                distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n                if rank1 > len(distr):\n                    q1 = distr\n                else:\n                    q1 = np.random.choice(distr, rank1,replace=False)  # top layer\n                # q1 = np.random.choice(np.arange(numNode_train), rank1)  # top layer\n\n                support1 = sparse_to_tuple(normADJ_batch[:, q1]*numNode_train/len(q1))\n\n                features_inputs = train_features[q1, :]  # selected nodes for approximation\n            # Construct feed dictionary\n            feed_dict = construct_feeddict_forMixlayers(features_inputs, support1, y_train_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n = n+1\n\n\n        # Validation\n        cost, acc, duration = evaluate(features, valSupport, y_val,  placeholders)\n        cost_val.append(cost)\n\n        if epoch > 50 and acc > maxACC:\n            maxACC = acc\n            save_path = saver.save(sess, \"tmp/tmp_MixModel_uniform.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    if os.path.exists(\"tmp/tmp_MixModel_uniform.ckpt\"):\n        saver.restore(sess, \"tmp/tmp_MixModel_uniform.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration),\n          \"epoch = {}\".format(epoch + 1),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\n\ndef transferG2ADJ():\n    G = json_graph.node_link_graph(json.load(open(\"reddit/reddit-G.json\")))\n    feat_id_map = json.load(open(\"reddit/reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n    numNode = len(feat_id_map)\n    adj = np.zeros((numNode, numNode))\n    newEdges0 = [feat_id_map[edge[0]] for edge in G.edges()]\n    newEdges1 = [feat_id_map[edge[1]] for edge in G.edges()]\n\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n    adj = sp.csr_matrix((np.ones((len(newEdges0),)), (newEdges0, newEdges1)), shape=(numNode, numNode))\n    sp.save_npz(\"reddit_adj.npz\", adj)\n\nif __name__==\"__main__\":\n\n    main(50)\n    # for k in [25, 50, 100, 200, 400]:\n    #     main(k)"
  },
  {
    "path": "train_batch_multiRank_inductive_reddit_appr2layers.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN, MLP, GCN_APPRO\nimport json\nfrom networkx.readwrite import json_graph\nimport os\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_appr', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.001, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 12, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 1e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 100, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\n\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    # train_feats = feats[[feat_id_map[id] for id in train_ids]]\n    # test_feats = feats[[feat_id_map[id] for id in test_ids]]\n\n    # numNode = len(feat_id_map)\n    # adj = sp.lil_matrix(np.zeros((numNode,numNode)))\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats = feats, y_train=train_labels, y_val=val_labels, y_test = test_labels, train_index = train_index,\n             val_index=val_index, test_index = test_index)\n\n\ndef transferLabel2Onehot(labels, N):\n    y = np.zeros((len(labels),N))\n    for i in range(len(labels)):\n        pos = labels[i]\n        y[i,pos] =1\n    return y\n\ndef main(rank1, rank0):\n\n\n\n    # config = tf.ConfigProto(device_count={\"CPU\": 4}, # limit to num_cpu_core CPU usage\n    #                 inter_op_parallelism_threads = 1,\n    #                 intra_op_parallelism_threads = 4,\n    #                 log_device_placement=False)\n    adj, features, y_train, y_val, y_test,train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj+adj.T\n\n\n    y_train = transferLabel2Onehot(y_train, 41)\n    y_val = transferLabel2Onehot(y_val, 41)\n    y_test = transferLabel2Onehot(y_test, 41)\n\n    features = sp.lil_matrix(features)\n\n    adj_train = adj[train_index, :][:, train_index]\n\n\n    numNode_train = adj_train.shape[0]\n\n    train_mask = np.ones((numNode_train,))\n    val_mask = np.ones((y_val.shape[0],))\n    test_mask = np.ones((y_test.shape[0],))\n\n\n    # print(\"numNode\", numNode)\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features).todense()\n    train_features = features[train_index]\n\n    if FLAGS.model == 'gcn_appr':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Define placeholders\n    placeholders = {\n        'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],\n        'features': tf.placeholder(tf.float32, shape=(None, features.shape[1])),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'labels_mask': tf.placeholder(tf.int32),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, mask, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n    saver = tf.train.Saver()\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[val_index, :])]\n    testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[test_index, :])]\n\n    t = time.time()\n    maxACC = 0.0\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=256, shuffle=True):\n            [normADJ_batch, y_train_batch, train_mask_batch] = batch\n            if sum(train_mask_batch) < 1:\n                continue\n\n\n            p1 = column_prop(normADJ_batch)\n            q1 = np.random.choice(np.arange(numNode_train), rank1, replace=False, p=p1)  # top layer\n            # q0 = np.random.choice(np.arange(numNode_train), rank0, p=p0)  # bottom layer\n            support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n\n            p2 = column_prop(normADJ_train[q1, :])\n            q0 = np.random.choice(np.arange(numNode_train), rank0, replace=False, p=p2)\n\n            support0 = sparse_to_tuple(normADJ_train[q1, :][:, q0])\n            features_inputs = np.diag(1.0 / (p2[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation\n\n            # distr = np.nonzero(np.sum(normADJ_batch, axis=0))[1]\n            # if rank1 > len(distr):\n            #     q1 = distr\n            # else:\n            #     q1 = np.random.choice(distr, rank1, replace=False)  # top layer\n            # distr0 = np.nonzero(np.sum(normADJ_train[q1,:], axis=0))[1]\n            # if rank0 > len(distr0):\n            #     q0 = distr0\n            # else:\n            #     q0 = np.random.choice(distr0, rank0, replace=False)  # top layer\n            #\n            # support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p0[q1] * rank1))))\n            # support0 = sparse_to_tuple(normADJ_train[q1, :][:, q0])\n            # features_inputs = np.diag(1.0 / (p0[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation\n\n\n            # Construct feed dictionary\n            feed_dict = construct_feed_dict(features_inputs, [support0, support1], y_train_batch, train_mask_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n            n=n+1\n\n\n        # Validation\n        cost, acc, duration = evaluate(features, valSupport, y_val, val_mask, placeholders)\n        cost_val.append(cost)\n\n        if epoch > 50 and acc>maxACC:\n            maxACC = acc\n            save_path = saver.save(sess, \"tmp/tmp_redditModel.ckpt\")\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time per batch=\", \"{:.5f}\".format((time.time() - t1)/n))\n\n        if epoch > FLAGS.early_stopping and np.mean(cost_val[-2:]) > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    if os.path.exists(\"tmp/tmp_redditModel.ckpt\"):\n        saver.restore(sess, \"tmp/tmp_redditModel.ckpt\")\n    test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test, test_mask,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"rank0 = {}\".format(rank0), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration),\n          \"epoch = {}\".format(epoch + 1),\n          \"test time=\", \"{:.5f}\".format(test_duration))\n\ndef transferG2ADJ():\n    G = json_graph.node_link_graph(json.load(open(\"reddit/reddit-G.json\")))\n    feat_id_map = json.load(open(\"reddit/reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n    numNode = len(feat_id_map)\n    adj = np.zeros((numNode, numNode))\n    newEdges0 = [feat_id_map[edge[0]] for edge in G.edges()]\n    newEdges1 = [feat_id_map[edge[1]] for edge in G.edges()]\n\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n    adj = sp.csr_matrix((np.ones((len(newEdges0),)), (newEdges0, newEdges1)), shape=(numNode, numNode))\n    sp.save_npz(\"reddit_adj.npz\", adj)\n\nif __name__==\"__main__\":\n    # transferRedditDataFormat(\"reddit/\",\"data/reddit.npz\")\n\n    main(100,100)\n    # for k in [50, 100, 200, 400]:\n    #     main(100, k)"
  },
  {
    "path": "train_batch_multiRank_inductive_reddit_onelayer.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport time\nimport tensorflow as tf\nimport scipy.sparse as sp\n\nfrom utils import *\nfrom models import GCN, MLP, GCN_APPRO_Onelayer\nimport json\nfrom networkx.readwrite import json_graph\n\n# Set random seed\nseed = 123\nnp.random.seed(seed)\ntf.set_random_seed(seed)\n\n# Settings\nflags = tf.app.flags\nFLAGS = flags.FLAGS\nflags.DEFINE_string('dataset', 'pubmed', 'Dataset string.')  # 'cora', 'citeseer', 'pubmed'\nflags.DEFINE_string('model', 'gcn_appr', 'Model string.')  # 'gcn', 'gcn_appr'\nflags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')\nflags.DEFINE_integer('epochs', 300, 'Number of epochs to train.')\nflags.DEFINE_integer('hidden1', 64, 'Number of units in hidden layer 1.')\nflags.DEFINE_float('dropout', 0.1, 'Dropout rate (1 - keep probability).')\nflags.DEFINE_float('weight_decay', 1e-4, 'Weight for L2 loss on embedding matrix.')\nflags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).')\nflags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')\nrank1 = 300\nrank0 = 300\n# Load data\n\n\ndef iterate_minibatches_listinputs(inputs, batchsize, shuffle=False):\n    assert inputs is not None\n    numSamples = inputs[0].shape[0]\n    if shuffle:\n        indices = np.arange(numSamples)\n        np.random.shuffle(indices)\n    for start_idx in range(0, numSamples - batchsize + 1, batchsize):\n        if shuffle:\n            excerpt = indices[start_idx:start_idx + batchsize]\n        else:\n            excerpt = slice(start_idx, start_idx + batchsize)\n        yield [input[excerpt] for input in inputs]\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    # train_feats = feats[[feat_id_map[id] for id in train_ids]]\n    # test_feats = feats[[feat_id_map[id] for id in test_ids]]\n\n    # numNode = len(feat_id_map)\n    # adj = sp.lil_matrix(np.zeros((numNode,numNode)))\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats = feats, y_train=train_labels, y_val=val_labels, y_test = test_labels, train_index = train_index,\n             val_index=val_index, test_index = test_index)\n\n\ndef transferLabel2Onehot(labels, N):\n    y = np.zeros((len(labels),N))\n    for i in range(len(labels)):\n        pos = labels[i]\n        y[i,pos] =1\n    return y\n\n\n\ndef run_regression(train_embeds, train_labels, test_embeds, test_labels):\n    np.random.seed(1)\n    from sklearn.linear_model import SGDClassifier\n    from sklearn.dummy import DummyClassifier\n    from sklearn.metrics import accuracy_score\n    dummy = DummyClassifier()\n    dummy.fit(train_embeds, train_labels)\n    log = SGDClassifier(loss=\"log\", n_jobs=55)\n    log.fit(train_embeds, train_labels)\n    print(\"Test scores\")\n    print(accuracy_score(test_labels, log.predict(test_embeds)))\n    print(\"Train scores\")\n    print(accuracy_score(train_labels, log.predict(train_embeds)))\n    print(\"Random baseline\")\n    print(accuracy_score(test_labels, dummy.predict(test_embeds)))\n\ndef main(rank1):\n    adj, features, y_train, y_val, y_test,train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n\n    adj = adj+adj.T\n\n    # train_index = train_index[:10000]\n    # val_index = val_index[:5000]\n    # test_index = test_index[:10000]\n    # y_train = transferLabel2Onehot(y_train, 50)[:10000]\n    # y_val = transferLabel2Onehot(y_val, 50)[:5000]\n    # y_test = transferLabel2Onehot(y_test, 50)[:10000]\n\n    y_train = transferLabel2Onehot(y_train, 50)\n    y_val = transferLabel2Onehot(y_val, 50)\n    y_test = transferLabel2Onehot(y_test, 50)\n\n\n\n    # adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)\n\n    features = sp.lil_matrix(features)\n\n\n    adj_train = adj[train_index, :][:, train_index]\n\n\n    adj_val = adj[val_index, :][:, val_index]\n\n    adj_test = adj[test_index, :][:, test_index]\n    numNode_train = adj_train.shape[0]\n\n    train_mask = np.ones((numNode_train,))\n    val_mask = np.ones((adj_val.shape[0],))\n    test_mask = np.ones((adj_test.shape[0],))\n\n\n    # print(\"numNode\", numNode)\n\n    # Some preprocessing\n    features = nontuple_preprocess_features(features)\n    train_features = features[train_index]\n\n    if FLAGS.model == 'gcn_appr':\n        normADJ_train = nontuple_preprocess_adj(adj_train)\n        normADJ = nontuple_preprocess_adj(adj)\n        # normADJ_val = nontuple_preprocess_adj(adj_val)\n        # normADJ_test = nontuple_preprocess_adj(adj_test)\n\n        num_supports = 2\n        model_func = GCN_APPRO_Onelayer\n    else:\n        raise ValueError('Invalid argument for model: ' + str(FLAGS.model))\n\n    # Define placeholders\n    placeholders = {\n        'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],\n        'features': tf.sparse_placeholder(tf.float32),\n        'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),\n        'labels_mask': tf.placeholder(tf.int32),\n        'dropout': tf.placeholder_with_default(0., shape=()),\n        'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout\n    }\n\n    # Create model\n    model = model_func(placeholders, input_dim=features.shape[-1], logging=True)\n\n    # Initialize session\n    sess = tf.Session()\n\n    # Define model evaluation function\n    def evaluate(features, support, labels, mask, placeholders):\n        t_test = time.time()\n        feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)\n        outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)\n        return outs_val[0], outs_val[1], (time.time() - t_test)\n\n    # Init variables\n    sess.run(tf.global_variables_initializer())\n\n    cost_val = []\n\n    p0 = column_prop(normADJ_train)\n\n    # testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]\n    valSupport = [sparse_to_tuple(normADJ[val_index, :])]\n    testSupport = [sparse_to_tuple(normADJ[test_index, :])]\n\n    t = time.time()\n    # Train model\n    for epoch in range(FLAGS.epochs):\n        t1 = time.time()\n\n        n = 0\n        for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=5120, shuffle=True):\n            [normADJ_batch, y_train_batch, train_mask_batch] = batch\n            if sum(train_mask_batch) < 1:\n                continue\n            p1 = column_prop(normADJ_batch)\n            if rank1 is not None:\n                q1 = np.random.choice(np.arange(numNode_train), rank1, p=p1)  # top layer\n                # q0 = np.random.choice(np.arange(numNode_train), rank0, p=p0)  # bottom layer\n                support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))\n\n                features_inputs = sparse_to_tuple(train_features[q1, :])  # selected nodes for approximation\n            else:\n                support1 = sparse_to_tuple(normADJ_batch)\n                features_inputs = sparse_to_tuple(train_features)\n            # Construct feed dictionary\n            feed_dict = construct_feed_dict(features_inputs, [support1], y_train_batch, train_mask_batch,\n                                            placeholders)\n            feed_dict.update({placeholders['dropout']: FLAGS.dropout})\n\n            # Training step\n            outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)\n\n\n        # Validation\n        cost, acc, duration = evaluate(sparse_to_tuple(features), valSupport, y_val, val_mask, placeholders)\n        cost_val.append(cost)\n\n        # Print results\n        print(\"Epoch:\", '%04d' % (epoch + 1), \"train_loss=\", \"{:.5f}\".format(outs[1]),\n              \"train_acc=\", \"{:.5f}\".format(outs[2]), \"val_loss=\", \"{:.5f}\".format(cost),\n              \"val_acc=\", \"{:.5f}\".format(acc), \"time=\", \"{:.5f}\".format(time.time() - t1))\n\n        if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):\n            # print(\"Early stopping...\")\n            break\n\n    train_duration = time.time() - t\n    # Testing\n    test_cost, test_acc, test_duration = evaluate(sparse_to_tuple(features), testSupport, y_test, test_mask,\n                                                  placeholders)\n    print(\"rank1 = {}\".format(rank1), \"rank0 = {}\".format(rank0), \"cost=\", \"{:.5f}\".format(test_cost),\n          \"accuracy=\", \"{:.5f}\".format(test_acc), \"training time=\", \"{:.5f}\".format(train_duration))\n\ndef transferG2ADJ():\n    G = json_graph.node_link_graph(json.load(open(\"reddit/reddit-G.json\")))\n    feat_id_map = json.load(open(\"reddit/reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n    numNode = len(feat_id_map)\n    adj = np.zeros((numNode, numNode))\n    newEdges0 = [feat_id_map[edge[0]] for edge in G.edges()]\n    newEdges1 = [feat_id_map[edge[1]] for edge in G.edges()]\n\n    # for edge in G.edges():\n    #     adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n    adj = sp.csr_matrix((np.ones((len(newEdges0),)), (newEdges0, newEdges1)), shape=(numNode, numNode))\n    sp.save_npz(\"reddit_adj.npz\", adj)\n\n\ndef original():\n    adj, features, y_train, y_val, y_test, train_index, val_index, test_index = loadRedditFromNPZ(\"data/\")\n    adj = adj+adj.T\n    normADJ = nontuple_preprocess_adj(adj)\n    features = adj.dot(features)\n\n    train_feats = features[train_index, :]\n    test_feats = features[test_index, :]\n\n    from sklearn.preprocessing import StandardScaler\n\n    scaler = StandardScaler()\n    scaler.fit(train_feats)\n    train_feats = scaler.transform(train_feats)\n    test_feats = scaler.transform(test_feats)\n    run_regression(train_feats, y_train, test_feats, y_test)\n\nif __name__==\"__main__\":\n    # transferRedditDataFormat(\"reddit/\",\"data/reddit.npz\")\n\n    # original()\n    main(50)\n\n\n\n"
  },
  {
    "path": "transformRedditGraph2NPZ.py",
    "content": "#### Please first download original Reddit Graph Data: http://snap.stanford.edu/graphsage/reddit.zip\n####\n\n\nimport json\nfrom networkx.readwrite import json_graph\nimport scipy.sparse as sp\nimport numpy as np\nimport pickle as pkl\n\n\ndef loadRedditFromG(dataset_dir, inputfile):\n    f= open(dataset_dir+inputfile)\n    objects = []\n    for _ in range(pkl.load(f)):\n        objects.append(pkl.load(f))\n    adj, train_labels, val_labels, test_labels, train_index, val_index, test_index = tuple(objects)\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    return sp.csr_matrix(adj), sp.lil_matrix(feats), train_labels, val_labels, test_labels, train_index, val_index, test_index\n\n\ndef loadRedditFromNPZ(dataset_dir):\n    adj = sp.load_npz(dataset_dir+\"reddit_adj.npz\")\n    data = np.load(dataset_dir+\"reddit.npz\")\n\n    return adj, data['feats'], data['y_train'], data['y_val'], data['y_test'], data['train_index'], data['val_index'], data['test_index']\n\n\ndef transferRedditData2AdjNPZ(dataset_dir):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    feat_id_map = json.load(open(dataset_dir + \"/reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n    numNode = len(feat_id_map)\n    print(numNode)\n    adj = sp.lil_matrix((numNode, numNode))\n    print(\"no\")\n    for edge in G.edges():\n        adj[feat_id_map[edge[0]], feat_id_map[edge[1]]] = 1\n    sp.save_npz(\"reddit_adj.npz\", sp.coo_matrix(adj))\n\n\ndef transferRedditDataFormat(dataset_dir, output_file):\n    G = json_graph.node_link_graph(json.load(open(dataset_dir + \"/reddit-G.json\")))\n    labels = json.load(open(dataset_dir + \"/reddit-class_map.json\"))\n\n    train_ids = [n for n in G.nodes() if not G.node[n]['val'] and not G.node[n]['test']]\n    test_ids = [n for n in G.nodes() if G.node[n]['test']]\n    val_ids = [n for n in G.nodes() if G.node[n]['val']]\n    train_labels = [labels[i] for i in train_ids]\n    test_labels = [labels[i] for i in test_ids]\n    val_labels = [labels[i] for i in val_ids]\n    feats = np.load(dataset_dir + \"/reddit-feats.npy\")\n    ## Logistic gets thrown off by big counts, so log transform num comments and score\n    feats[:, 0] = np.log(feats[:, 0] + 1.0)\n    feats[:, 1] = np.log(feats[:, 1] - min(np.min(feats[:, 1]), -1))\n    feat_id_map = json.load(open(dataset_dir + \"reddit-id_map.json\"))\n    feat_id_map = {id: val for id, val in feat_id_map.iteritems()}\n\n    train_index = [feat_id_map[id] for id in train_ids]\n    val_index = [feat_id_map[id] for id in val_ids]\n    test_index = [feat_id_map[id] for id in test_ids]\n    np.savez(output_file, feats=feats, y_train=train_labels, y_val=val_labels, y_test=test_labels,\n             train_index=train_index,\n             val_index=val_index, test_index=test_index)\n\n\nif __name__==\"__main__\":\n    # transferRedditData2AdjNPZ(\"reddit\")\n    transferRedditDataFormat(\"reddit\",\"reddit.npz\")"
  },
  {
    "path": "utils.py",
    "content": "import numpy as np\nimport pickle as pkl\nimport networkx as nx\nimport scipy.sparse as sp\nfrom scipy.sparse.linalg.eigen.arpack import eigsh\nimport sys\nfrom scipy.sparse.linalg import norm as sparsenorm\nfrom scipy.linalg import qr\n# from sklearn.metrics import f1_score\n\n\ndef parse_index_file(filename):\n    \"\"\"Parse index file.\"\"\"\n    index = []\n    for line in open(filename):\n        index.append(int(line.strip()))\n    return index\n\n\ndef sample_mask(idx, l):\n    \"\"\"Create mask.\"\"\"\n    mask = np.zeros(l)\n    mask[idx] = 1\n    return np.array(mask, dtype=np.bool)\n\n#\n# def calc_f1(y_true, y_pred):\n#     y_true = np.argmax(y_true, axis=1)\n#     y_pred = np.argmax(y_pred, axis=1)\n#     return f1_score(y_true, y_pred, average=\"micro\"), f1_score(y_true, y_pred, average=\"macro\")\n#\n\n#\n# def load_data(dataset_str):\n#     \"\"\"Load data.\"\"\"\n#     names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']\n#     objects = []\n#     for i in range(len(names)):\n#         with open(\"data/ind.{}.{}\".format(dataset_str, names[i]), 'rb') as f:\n#             if sys.version_info > (3, 0):\n#                 objects.append(pkl.load(f, encoding='latin1'))\n#             else:\n#                 objects.append(pkl.load(f))\n#\n#     x, y, tx, ty, allx, ally, graph = tuple(objects)\n#     test_idx_reorder = parse_index_file(\"data/ind.{}.test.index\".format(dataset_str))\n#     test_idx_range = np.sort(test_idx_reorder)\n#\n#     if dataset_str == 'citeseer':\n#         # Fix citeseer dataset (there are some isolated nodes in the graph)\n#         # Find isolated nodes, add them as zero-vecs into the right position\n#         test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1)\n#         tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))\n#         tx_extended[test_idx_range-min(test_idx_range), :] = tx\n#         tx = tx_extended\n#         ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))\n#         ty_extended[test_idx_range-min(test_idx_range), :] = ty\n#         ty = ty_extended\n#\n#     features = sp.vstack((allx, tx)).tolil()\n#     features[test_idx_reorder, :] = features[test_idx_range, :]\n#     adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))\n#\n#     labels = np.vstack((ally, ty))\n#     labels[test_idx_reorder, :] = labels[test_idx_range, :]\n#\n#     idx_test = test_idx_range.tolist()\n#     idx_train = range(len(y))\n#     idx_val = range(len(y), len(y)+500)\n#\n#     train_mask = sample_mask(idx_train, labels.shape[0])\n#     val_mask = sample_mask(idx_val, labels.shape[0])\n#     test_mask = sample_mask(idx_test, labels.shape[0])\n#\n#     y_train = np.zeros(labels.shape)\n#     y_val = np.zeros(labels.shape)\n#     y_test = np.zeros(labels.shape)\n#     y_train[train_mask, :] = labels[train_mask, :]\n#     y_val[val_mask, :] = labels[val_mask, :]\n#     y_test[test_mask, :] = labels[test_mask, :]\n#\n#     return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask\n#\n\n\ndef load_data(dataset_str):\n    \"\"\"Load data.\"\"\"\n    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']\n    objects = []\n    for i in range(len(names)):\n        with open(\"data/ind.{}.{}\".format(dataset_str, names[i]), 'rb') as f:\n            if sys.version_info > (3, 0):\n                objects.append(pkl.load(f, encoding='latin1'))\n            else:\n                objects.append(pkl.load(f))\n\n    x, y, tx, ty, allx, ally, graph = tuple(objects)\n    test_idx_reorder = parse_index_file(\"data/ind.{}.test.index\".format(dataset_str))\n    test_idx_range = np.sort(test_idx_reorder)\n\n    if dataset_str == 'citeseer':\n        # Fix citeseer dataset (there are some isolated nodes in the graph)\n        # Find isolated nodes, add them as zero-vecs into the right position\n        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1)\n        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))\n        tx_extended[test_idx_range-min(test_idx_range), :] = tx\n        tx = tx_extended\n        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))\n        ty_extended[test_idx_range-min(test_idx_range), :] = ty\n        ty = ty_extended\n\n    features = sp.vstack((allx, tx)).tolil()\n    features[test_idx_reorder, :] = features[test_idx_range, :]\n    adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))\n\n    labels = np.vstack((ally, ty))\n    labels[test_idx_reorder, :] = labels[test_idx_range, :]\n\n    idx_test = test_idx_range.tolist()\n    idx_train = range(len(ally)-500)\n    idx_val = range(len(ally)-500, len(ally))\n\n    train_mask = sample_mask(idx_train, labels.shape[0])\n    val_mask = sample_mask(idx_val, labels.shape[0])\n    test_mask = sample_mask(idx_test, labels.shape[0])\n\n    y_train = np.zeros(labels.shape)\n    y_val = np.zeros(labels.shape)\n    y_test = np.zeros(labels.shape)\n    y_train[train_mask, :] = labels[train_mask, :]\n    y_val[val_mask, :] = labels[val_mask, :]\n    y_test[test_mask, :] = labels[test_mask, :]\n\n    return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask\n\ndef load_data_original(dataset_str):\n    \"\"\"Load data.\"\"\"\n    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']\n    objects = []\n    for i in range(len(names)):\n        with open(\"data/ind.{}.{}\".format(dataset_str, names[i]), 'rb') as f:\n            if sys.version_info > (3, 0):\n                objects.append(pkl.load(f, encoding='latin1'))\n            else:\n                objects.append(pkl.load(f))\n\n    x, y, tx, ty, allx, ally, graph = tuple(objects)\n    test_idx_reorder = parse_index_file(\"data/ind.{}.test.index\".format(dataset_str))\n    test_idx_range = np.sort(test_idx_reorder)\n\n    if dataset_str == 'citeseer':\n        # Fix citeseer dataset (there are some isolated nodes in the graph)\n        # Find isolated nodes, add them as zero-vecs into the right position\n        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1)\n        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))\n        tx_extended[test_idx_range-min(test_idx_range), :] = tx\n        tx = tx_extended\n        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))\n        ty_extended[test_idx_range-min(test_idx_range), :] = ty\n        ty = ty_extended\n\n    features = sp.vstack((allx, tx)).tolil()\n    features[test_idx_reorder, :] = features[test_idx_range, :]\n    adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))\n\n    labels = np.vstack((ally, ty))\n    labels[test_idx_reorder, :] = labels[test_idx_range, :]\n\n    idx_test = test_idx_range.tolist()\n    idx_train = range(len(y))\n    idx_val = range(len(y), len(y)+500)\n\n    train_mask = sample_mask(idx_train, labels.shape[0])\n    val_mask = sample_mask(idx_val, labels.shape[0])\n    test_mask = sample_mask(idx_test, labels.shape[0])\n\n    y_train = np.zeros(labels.shape)\n    y_val = np.zeros(labels.shape)\n    y_test = np.zeros(labels.shape)\n    y_train[train_mask, :] = labels[train_mask, :]\n    y_val[val_mask, :] = labels[val_mask, :]\n    y_test[test_mask, :] = labels[test_mask, :]\n\n    return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask\n\n\ndef sparse_to_tuple(sparse_mx):\n    \"\"\"Convert sparse matrix to tuple representation.\"\"\"\n    def to_tuple(mx):\n        if not sp.isspmatrix_coo(mx):\n            mx = mx.tocoo()\n        coords = np.vstack((mx.row, mx.col)).transpose()\n        values = mx.data\n        shape = mx.shape\n        return coords, values, shape\n\n    if isinstance(sparse_mx, list):\n        for i in range(len(sparse_mx)):\n            sparse_mx[i] = to_tuple(sparse_mx[i])\n    else:\n        sparse_mx = to_tuple(sparse_mx)\n\n    return sparse_mx\n\n\ndef nontuple_preprocess_features(features):\n    \"\"\"Row-normalize feature matrix and convert to tuple representation\"\"\"\n    rowsum = np.array(features.sum(1))\n    r_inv = np.power(rowsum, -1).flatten()\n    r_inv[np.isinf(r_inv)] = 0.\n    r_mat_inv = sp.diags(r_inv)\n    features = r_mat_inv.dot(features)\n    return features\n\n\ndef preprocess_features(features):\n    \"\"\"Row-normalize feature matrix and convert to tuple representation\"\"\"\n    rowsum = np.array(features.sum(1))\n    r_inv = np.power(rowsum, -1).flatten()\n    r_inv[np.isinf(r_inv)] = 0.\n    r_mat_inv = sp.diags(r_inv)\n    features = r_mat_inv.dot(features)\n    return sparse_to_tuple(features)\n\n\ndef normalize_adj(adj):\n    \"\"\"Symmetrically normalize adjacency matrix.\"\"\"\n    adj = sp.coo_matrix(adj)\n    rowsum = np.array(adj.sum(1))\n    d_inv_sqrt = np.power(rowsum, -0.5).flatten()\n    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.\n    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)\n    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()\n\ndef nontuple_preprocess_adj(adj):\n    adj_normalized = normalize_adj(sp.eye(adj.shape[0]) + adj)\n    # adj_normalized = sp.eye(adj.shape[0]) + normalize_adj(adj)\n    return adj_normalized.tocsr()\n\ndef column_prop(adj):\n    column_norm = sparsenorm(adj, axis=0)\n    # column_norm = pow(sparsenorm(adj, axis=0),2)\n    norm_sum = sum(column_norm)\n    return column_norm/norm_sum\n\ndef mix_prop(adj, features, sparseinputs=False):\n    adj_column_norm = sparsenorm(adj, axis=0)\n    if sparseinputs:\n        features_row_norm = sparsenorm(features, axis=1)\n    else:\n        features_row_norm = np.linalg.norm(features, axis=1)\n    mix_norm = adj_column_norm*features_row_norm\n\n    norm_sum = sum(mix_norm)\n    return mix_norm / norm_sum\n\n\ndef preprocess_adj(adj):\n    \"\"\"Preprocessing of adjacency matrix for simple GCN model and conversion to tuple representation.\"\"\"\n    # adj_appr = np.array(sp.csr_matrix.todense(adj))\n    # # adj_appr = dense_lanczos(adj_appr, 100)\n    # adj_appr = dense_RandomSVD(adj_appr, 100)\n    # if adj_appr.sum(1).min()<0:\n    #     adj_appr = adj_appr- (adj_appr.sum(1).min()-0.5)*sp.eye(adj_appr.shape[0])\n    # else:\n    #     adj_appr = adj_appr + sp.eye(adj_appr.shape[0])\n    # adj_normalized = normalize_adj(adj_appr)\n\n    # adj_normalized = normalize_adj(adj+sp.eye(adj.shape[0]))\n    # adj_appr = np.array(sp.coo_matrix.todense(adj_normalized))\n    # # adj_normalized = dense_RandomSVD(adj_appr,100)\n    # adj_normalized = dense_lanczos(adj_appr, 100)\n\n\n    adj_normalized = normalize_adj(sp.eye(adj.shape[0]) + adj)\n    # adj_normalized = sp.eye(adj.shape[0]) + normalize_adj(adj)\n    return sparse_to_tuple(adj_normalized)\n\nfrom lanczos import lanczos\ndef dense_lanczos(A,K):\n    q = np.random.randn(A.shape[0], )\n    Q, sigma = lanczos(A, K, q)\n    A2 = np.dot(Q[:,:K], np.dot(sigma[:K,:K], Q[:,:K].T))\n    return sp.csr_matrix(A2)\n\ndef sparse_lanczos(A,k):\n    q = sp.random(A.shape[0],1)\n    n = A.shape[0]\n    Q = sp.lil_matrix(np.zeros((n,k+1)))\n    A = sp.lil_matrix(A)\n\n    Q[:,0] = q/sparsenorm(q)\n\n    alpha = 0\n    beta = 0\n\n    for i in range(k):\n      if i == 0:\n        q = A*Q[:,i]\n      else:\n        q = A*Q[:,i] - beta*Q[:,i-1]\n      alpha = q.T*Q[:,i]\n      q = q - Q[:,i]*alpha\n      q = q - Q[:,:i]*Q[:,:i].T*q # full reorthogonalization\n      beta = sparsenorm(q)\n      Q[:,i+1] = q/beta\n      print(i)\n\n    Q = Q[:,:k]\n\n    Sigma = Q.T*A*Q\n    A2 = Q[:,:k]*Sigma[:k,:k]*Q[:,:k].T\n    return A2\n    # return Q, Sigma\n\ndef dense_RandomSVD(A,K):\n    G = np.random.randn(A.shape[0],K)\n    B = np.dot(A,G)\n    Q,R =qr(B,mode='economic')\n    M = np.dot(Q, np.dot(Q.T, A))\n    return sp.csr_matrix(M)\n\n\ndef construct_feed_dict(features, support, labels, labels_mask, placeholders):\n    \"\"\"Construct feed dictionary.\"\"\"\n    feed_dict = dict()\n    feed_dict.update({placeholders['labels']: labels})\n    feed_dict.update({placeholders['labels_mask']: labels_mask})\n    feed_dict.update({placeholders['features']: features})\n    feed_dict.update({placeholders['support'][i]: support[i] for i in range(len(support))})\n    feed_dict.update({placeholders['num_features_nonzero']: features[1].shape})\n    return feed_dict\n\n\ndef chebyshev_polynomials(adj, k):\n    \"\"\"Calculate Chebyshev polynomials up to order k. Return a list of sparse matrices (tuple representation).\"\"\"\n    print(\"Calculating Chebyshev polynomials up to order {}...\".format(k))\n\n    adj_normalized = normalize_adj(adj)\n    laplacian = sp.eye(adj.shape[0]) - adj_normalized\n    largest_eigval, _ = eigsh(laplacian, 1, which='LM')\n    scaled_laplacian = (2. / largest_eigval[0]) * laplacian - sp.eye(adj.shape[0])\n\n    t_k = list()\n    t_k.append(sp.eye(adj.shape[0]))\n    t_k.append(scaled_laplacian)\n\n    def chebyshev_recurrence(t_k_minus_one, t_k_minus_two, scaled_lap):\n        s_lap = sp.csr_matrix(scaled_lap, copy=True)\n        return 2 * s_lap.dot(t_k_minus_one) - t_k_minus_two\n\n    for i in range(2, k+1):\n        t_k.append(chebyshev_recurrence(t_k[-1], t_k[-2], scaled_laplacian))\n\n    return sparse_to_tuple(t_k)\n\n\n\n"
  }
]