Repository: YZHANG1270/Aspect-Based-Sentiment-Analysis
Branch: master
Commit: e5505ab3fdcf
Files: 30
Total size: 20.9 MB
Directory structure:
gitextract_rxbagi6o/
├── README.md
├── ai_challenge_sentiment/
│ ├── code/
│ │ └── sentiment_analysis2018_baseline/
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── data_process.py
│ │ ├── main_predict.py
│ │ ├── main_train.py
│ │ ├── model.py
│ │ └── requirements.txt
│ ├── model.py
│ └── train.py
├── aspect_predict.py
├── config.json
├── data/
│ ├── aspect/
│ │ ├── aspect_svc_test.xlsx
│ │ └── aspect_svc_train.xlsx
│ ├── chinese/
│ │ ├── CH_CAME_SB1_TEST.xlsx
│ │ ├── CH_PHNS_SB1_TEST.xlsx
│ │ ├── Chinese_phones_training.xlsx
│ │ └── camera_training.xlsx
│ └── polarity/
│ └── polarity_docu.xlsx
├── polarity_predict.py
├── train/
│ ├── aspect_classifier.py
│ ├── model/
│ │ ├── bilstm.py
│ │ └── model.py
│ └── polarity_classifier.py
└── utils/
├── __init__.py
├── baidu_tagging.py
├── data_process.py
├── grammar.py
└── utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: README.md
================================================
# ABSA
Aspect Based Sentiment Analysis
虽说是基于观点的分析,但也是基于句子层的分析,因为需要按句子进行分析。

##### 概念参考
- ABSA refer presentation [[ppt](https://www.iaria.org/conferences2016/filesHUSO16/OrpheeDeClercq_Keynote_ABSA.pdf)]
- 阿里云的商品评价解析 [[link](https://help.aliyun.com/document_detail/64231.html?spm=5176.12095382.1232858.4.739e3b24xUnvbZ)]
| 参数名 | 值 |
| -------------- | ------------------------------------------------------------ |
| textPolarity | 整条文本情感极性:正、中、负,text字段输入非法时返回-100 |
| textIntensity | 整条文本情感程度(取值范围[-1,1],越大代表越正向,越小代表越负向,接近0代表中性) |
| aspectItem | 属性情感列表,每个元素是一个json字段 |
| aspectCategory | 属性类别 |
| aspectIndex | 属性词所在的起始位置,终结位置 |
| aspectTerm | 属性词 |
| opinionTerm | 情感词 |
| aspectPolarity | 属性片段极性(正、中、负) |
##### Task Process
1. 按句 提取 属性词
2. 按句 提取 情感词
3. 属性词所在起始位置,终止位置
4. 属性词 -> EA分类
5. 情感词 -> 极性分类
6. 整条文本的感情极性(正、负、中) 及其概率值
##### Done Tasks
根据现有数据集,实际完成的任务
- [x] 按句进行 EA 分类
- [x] 按句进行情感极性分析
##### To do
- [ ] 观点过滤:文字噪音处理、虚假评论、水军、广告、不含观点、无意义文本
- [ ] negation 否定处理
##### SemEval ABSA
- NLP的 SemEval 论文合辑 [[ACL](https://www.aclweb.org/anthology/)]
- SemEval - 2014 - ABSA [[competition](http://alt.qcri.org/semeval2014/task4/)] [[data](http://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools)]
- SemEval - 2015 - ABSA [[competition](http://alt.qcri.org/semeval2015/task12/)] [[data](http://alt.qcri.org/semeval2015/task12/index.php?id=data-and-tools)] [[paper](https://www.aclweb.org/anthology/S15-2082)]
- SemEval - 2016 - ABSA [[competition](http://alt.qcri.org/semeval2016/task5/)] [[data](http://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools)] [[guideline](http://alt.qcri.org/semeval2016/task5/data/uploads/absa2016_annotationguidelines.pdf)] [[paper](https://www.aclweb.org/anthology/S16-1002)]
- bonus: CodaLab Competitions [[intro](https://www.hse.ru/data/2017/05/31/1171931089/CodaLabCompetitions.pdf)]
##### 可参考的GitHub项目
数据集基本都基于 2014-2016 SemEval 比赛
- [data: self data] [Unsupervised-Aspect-Extraction](https://github.com/ruidan/Unsupervised-Aspect-Extraction)
- [data: SemEval-2016] [aspect-extraction](https://github.com/soujanyaporia/aspect-extraction)
- [data: SemEval-2015] [AspectBasedSentimentAnalysis](https://github.com/yardstick17/AspectBasedSentimentAnalysis) 跑了下这个项目,其中结合了语法分析和机器学习,按照语法规则抽取的属性词。代码嵌套逻辑比较强,不建议套用。
- [data: SemEval-2016] [Review_aspect_extraction](https://github.com/yafangy/Review_aspect_extraction)
- [data: SemEval-2014, 2016] [DE-CNN](https://github.com/howardhsu/DE-CNN)
- [data: SemEval-2015] [Coupled-Multi-layer-Attentions](https://github.com/happywwy/Coupled-Multi-layer-Attentions)
- [data: SemEval-2016 laptop] [mem_absa](https://github.com/ganeshjawahar/mem_absa)
- [data: SemEval-2014] [ABSA-PyTorch](https://github.com/songyouwei/ABSA-PyTorch)
- [data: SemEval-2014, 2016] [Attention_Based_LSTM_AspectBased_SA](https://github.com/gangeshwark/Attention_Based_LSTM_AspectBased_SA)
- [data: SemEval-2014] [ABSA_Keras](https://github.com/AlexYangLi/ABSA_Keras) 利用了tensorflow hub,适用hub时出现了版本问题未跑通。
- [data: SemEval-2016] [ABSA](https://github.com/LingxB/ABSA/tree/master/Data/SemEval)
##### paper
- Deep Learning for Aspect-Based Sentiment Analysis [[paper](https://cs224d.stanford.edu/reports/WangBo.pdf)]
- Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings [[paper](https://www.aclweb.org/anthology/D15-1168)]
- Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts [[paper](https://ai.tencent.com/ailab/media/publications/naacl2018/Encoding_Conversation_Context_for_Neural_Keyphrase_Extraction_from_Microblog_Posts.pdf)]
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [[paper](https://arxiv.org/pdf/1603.01354.pdf)]
- [2012] 用户评论中的标签抽取以及排序 [[paper](http://lipiji.com/docs/li2011opinion.pdf)]
##### 数据集
###### 中文
- AI-Challenge [[data](https://drive.google.com/file/d/1OInXRx_OmIJgK3ZdoFZnmqUi0rGfOaQo/view)]
- SemEval ABSA 2016 [[data](http://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools)]
###### 英文
- Amazon product data [[data](http://jmcauley.ucsd.edu/data/amazon/)]
- Web data: Amazon reviews [[data](https://snap.stanford.edu/data/web-Amazon.html)]
- Amazon Fine Food Reviews [[kaggle](https://www.kaggle.com/snap/amazon-fine-food-reviews)]
- SemEval ABSA
#### 优化方向
##### 字/词/句 文本嵌入Embedding
###### 中文
- Chinese Word Vectors [[github](https://github.com/Embedding/Chinese-Word-Vectors)]
- nlp_chinese_corpus [[github](https://github.com/brightmart/nlp_chinese_corpus)]
- 泛化语料、专业语料、向量化时,如何整合,还是两者独立向量化
ABSA书的目录,可以学习逻辑
#### ABSA Book Outline
1. Introduction
2. Aspect-Based Sentiment Analysis (ABSA)
- 2.1. The three tasks of ABSA
- 2.2. Domain and benchmark datasets
- 2.3. Previous approaches to ABSA tasks
- 2.4. Evaluation measures of ABSA tasks
3. Deep Learning for ABSA
- 3.1. Multiple layers of DNN
- 3.2. Initialization of input vectors
- 3.2.1. Word embeddings vectors
- 3.2.2. Featuring vectors
- 3.2.3. Part-Of-Speech (POS) and chunk tags
- 3.2.4. Commonsense knowledge
- 3.3. Training process of DNNs
- 3.4. Convolutional Neural Network Model (CNN)
- 3.4.1. Architecture
- 3.4.2. Application in consumer review domain
- 3.5. Recurrent Neural Network Models (RNN)
- 3.5.1. Computation of RNN models
- 3.5.2. Bidirectional RNN
- 3.5.3. Attention mechanism and memory networks
- 3.5.4. Application in the consumer review domain
- 3.5.5. Application in targeted sentiment analysis
- 3.6. Recursive Neural Network Model (RecNN)
- 3.6.1. Architecture
- 3.6.2. Application
- 3.7. Hybrid models
4. Comparison of performance on benchmark datasets
- 4.1. Opinion target extraction
- 4.2. Aspect category detection
- 4.3. Sentiment polarity of aspect-based consumer reviews
- 4.4. Sentiment polarity of targeted text
5. Challenges
- 5.1. Domain adaptation
- 5.2. Multilingual application
- 5.3. Technical requirements
- 5.4. Linguistic complications
6. Conclusion
7. Appendix: List of Abbreviations
8. References
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/README.md
================================================
AI Challenger Sentiment Analysis Baseline
=========================================
功能描述
---
本工程主要用于为参赛者提供一个baseline,方便参赛者快速上手比赛,主要功能涵盖完成比赛的全流程,如数据读取、分词、特征提取、模型定义以及封装、
模型训练、模型验证、模型存储以及模型预测等。baseline仅是一个简单的参考,希望参赛者能够充分发挥自己的想象,构建在该任务上更加强大的模型。
开发环境
---
* 主要依赖工具包以及版本,详情见requirements.txt
项目结构
---
* src/config.py 项目配置信息模块,主要包括文件读取或存储路径信息
* src/data_process.py 数据处理模块,主要包括数据的读取以及处理等功能
* src/model.py 模型定义模块,主要包括模型的定义以及使用封装
* src/main_train.py 模型训练模块,模型训练流程包括 数据读取、分词、特征提取、模型训练、模型验证、模型存储等步骤
* src/main_predict.py 模型预测模块,模型预测流程包括 数据和模型的读取、分词、模型预测、预测结果存储等步骤
使用方法
---
* 配置 在config.py中配置好文件存储路径
* 训练 运行nohup python main_train.py -mn your_model_name & 训练模型并保存,同时通过日志可以得到验证集的F1_score指标
* 预测 运行nohup python main_predict.py -mn your_model_name $ 通过加载上一步的模型,在测试集上做预测
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/__init__.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/data_process.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
import pandas as pd
import jieba
# 加载数据
def load_data_from_csv(file_name, header=0, encoding="utf-8"):
data_df = pd.read_csv(file_name, header=header, encoding=encoding)
return data_df
# 分词
def seg_words(contents):
contents_segs = list()
for content in contents:
segs = jieba.lcut(content)
contents_segs.append(" ".join(segs))
return contents_segs
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/main_predict.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
from data_process import seg_words, load_data_from_csv
import config
import logging
import argparse
from sklearn.externals import joblib
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] <%(processName)s> (%(threadName)s) %(message)s')
logger = logging.getLogger(__name__)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-mn', '--model_name', type=str, nargs='?',
help='the name of model')
args = parser.parse_args()
model_name = args.model_name
if not model_name:
model_name = "model_dict.pkl"
# load data
logger.info("start load data")
test_data_df = load_data_from_csv(config.test_data_path)
# load model
logger.info("start load model")
classifier_dict = joblib.load(config.model_save_path + model_name)
columns = test_data_df.columns.tolist()
# seg words
logger.info("start seg test data")
content_test = test_data_df.iloc[:, 1]
content_test = seg_words(content_test)
logger.info("complete seg test data")
# model predict
logger.info("start predict test data")
for column in columns[2:]:
test_data_df[column] = classifier_dict[column].predict(content_test)
logger.info("compete %s predict" % column)
test_data_df.to_csv(config.test_data_predict_out_path, encoding="utf_8_sig", index=False)
logger.info("compete predict test data")
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/main_train.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
from data_process import load_data_from_csv, seg_words
from model import TextClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import config
import logging
import numpy as np
from sklearn.externals import joblib
import os
import argparse
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] <%(processName)s> (%(threadName)s) %(message)s')
logger = logging.getLogger(__name__)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-mn', '--model_name', type=str, nargs='?',
help='the name of model')
args = parser.parse_args()
model_name = args.model_name
if not model_name:
model_name = "model_dict.pkl"
# load train data
logger.info("start load data")
train_data_df = load_data_from_csv(config.train_data_path)
validate_data_df = load_data_from_csv(config.validate_data_path)
content_train = train_data_df.iloc[:, 1]
logger.info("start seg train data")
content_train = seg_words(content_train)
logger.info("complete seg train data")
columns = train_data_df.columns.values.tolist()
logger.info("start train feature extraction")
vectorizer_tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 5), min_df=5, norm='l2')
vectorizer_tfidf.fit(content_train)
logger.info("complete train feature extraction models")
logger.info("vocab shape: %s" % np.shape(vectorizer_tfidf.vocabulary_.keys()))
# model train
logger.info("start train model")
classifier_dict = dict()
for column in columns[2:]:
label_train = train_data_df[column]
text_classifier = TextClassifier(vectorizer=vectorizer_tfidf)
logger.info("start train %s model" % column)
text_classifier.fit(content_train, label_train)
logger.info("complete train %s model" % column)
classifier_dict[column] = text_classifier
logger.info("complete train model")
# validate model
content_validate = validate_data_df.iloc[:, 1]
logger.info("start seg validate data")
content_validate = seg_words(content_validate)
logger.info("complete seg validate data")
logger.info("start validate model")
f1_score_dict = dict()
for column in columns[2:]:
label_validate = validate_data_df[column]
text_classifier = classifier_dict[column]
f1_score = text_classifier.get_f1_score(content_validate, label_validate)
f1_score_dict[column] = f1_score
f1_score = np.mean(list(f1_score_dict.values()))
str_score = "\n"
for column in columns[2:]:
str_score = str_score + column + ":" + str(f1_score_dict[column]) + "\n"
logger.info("f1_scores: %s\n" % str_score)
logger.info("f1_score: %s" % f1_score)
logger.info("complete validate model")
# save model
logger.info("start save model")
model_save_path = config.model_save_path
if not os.path.exists(model_save_path):
os.makedirs(model_save_path)
joblib.dump(classifier_dict, model_save_path + model_name)
logger.info("complete save model")
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/model.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import f1_score
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] <%(processName)s> (%(threadName)s) %(message)s')
logger = logging.getLogger(__name__)
class TextClassifier():
def __init__(self, vectorizer, classifier=MultinomialNB()):
classifier = SVC(kernel="rbf")
# classifier = SVC(kernel="linear")
self.classifier = classifier
self.vectorizer = vectorizer
def features(self, x):
return self.vectorizer.transform(x)
def fit(self, x, y):
self.classifier.fit(self.features(x), y)
def predict(self, x):
return self.classifier.predict(self.features(x))
def score(self, x, y):
return self.classifier.score(self.features(x), y)
def get_f1_score(self, x, y):
return f1_score(y, self.predict(x), average='macro')
================================================
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/requirements.txt
================================================
python==2.7.13
numpy==1.13.1
pandas==0.20.3
jieba==0.39
sklearn==0.19.2
================================================
FILE: ai_challenge_sentiment/model.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import f1_score
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] <%(processName)s> (%(threadName)s) %(message)s')
logger = logging.getLogger(__name__)
class TextClassifier():
def __init__(self, vectorizer, classifier=MultinomialNB()):
classifier = SVC(kernel="rbf")
# classifier = SVC(kernel="linear")
self.classifier = classifier
self.vectorizer = vectorizer
def features(self, x):
return self.vectorizer.transform(x)
def fit(self, x, y):
self.classifier.fit(self.features(x), y)
def predict(self, x):
return self.classifier.predict(self.features(x))
def score(self, x, y):
return self.classifier.score(self.features(x), y)
def get_f1_score(self, x, y):
return f1_score(y, self.predict(x), average='macro')
================================================
FILE: ai_challenge_sentiment/train.py
================================================
# -*- coding: utf-8 -*-
"""
Spyder Editor
This is a temporary script file.
"""
import os
os.chdir("C:/Users/LUMI/Desktop/sentiment")
import pandas as pd
import jieba
from model import TextClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn.externals import joblib
def seg_words(contents):
contents_segs = list()
for content in contents:
segs = jieba.lcut(content)
contents_segs.append(" ".join(segs))
return contents_segs
# load train data
train_data_df = pd.read_csv('data/train/train.csv')
validate_data_df = pd.read_csv('data/validation/validation.csv')
content_train = train_data_df.iloc[:, 1]
content_train = seg_words(content_train)
columns = train_data_df.columns.values.tolist()
vectorizer_tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 5), min_df=5, norm='l2')
vectorizer_tfidf.fit(content_train)
# model train
classifier_dict = dict()
for column in columns[2:]:
label_train = train_data_df[column]
text_classifier = TextClassifier(vectorizer=vectorizer_tfidf)
text_classifier.fit(content_train, label_train)
classifier_dict[column] = text_classifier
# validate model
content_validate = validate_data_df.iloc[:, 1]
content_validate = seg_words(content_validate)
f1_score_dict = dict()
for column in columns[2:]:
label_validate = validate_data_df[column]
text_classifier = classifier_dict[column]
f1_score = text_classifier.get_f1_score(content_validate, label_validate)
f1_score_dict[column] = f1_score
f1_score = np.mean(list(f1_score_dict.values()))
str_score = "\n"
for column in columns[2:]:
str_score = str_score + column + ":" + str(f1_score_dict[column]) + "\n"
# save model
joblib.dump(classifier_dict, model_save_path + model_name)
================================================
FILE: aspect_predict.py
================================================
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import os
from sklearn.externals import joblib
from utils.utils import delimiter
from utils.data_process import seg_words,load_aspect_list
class AspectPredict(object):
def __init__(self):
path_delimiter = delimiter()
path_absa = os.path.abspath('.')
# config
model_name = 'aspect_svc' # todo: add to config
path_config = path_absa + path_delimiter + 'config.json'
# load model
path_model = path_absa + path_delimiter + 'model' + path_delimiter + '{}.mdl'.format(model_name)
self.model = joblib.load(path_model)
# load aspect list
self.aspect_list = load_aspect_list(path_config)
def predict(self, text):
# 1. generate result
result = dict()
result['text'] = text
result['aspectCategory'] = []
# 2. seg words
content_test = seg_words([text])
# 3. predict
all_result = dict()
for column in self.aspect_list:
all_result[column] = self.model[column].predict(content_test)[0]
if all_result[column]>0.5:
result['aspectCategory'].append(column)
result['all_result'] = all_result
print('PREDICT RESULT:',result)
print('PREDICT ASPECT:', result['aspectCategory'])
return result
if __name__=="__main__":
aspect = AspectPredict()
aspect.predict('这块屏幕不错')
================================================
FILE: config.json
================================================
{"aspect_list": ["HARDWARE#USABILITY", "BATTERY#USABILITY", "HARDWARE#QUALITY", "MEMORY#GENERAL", "OS#PRICE", "MULTIMEDIA_DEVICES#QUALITY", "MULTIMEDIA_DEVICES#OPERATION_PERFORMANCE", "PORTS#DESIGN_FEATURES", "MULTIMEDIA_DEVICES#USABILITY", "OS#GENERAL", "SUPPORT#MISCELLANEOUS", "KEYBOARD#GENERAL", "POWER_SUPPLY#OPERATION_PERFORMANCE", "PHONE#QUALITY", "MEMORY#DESIGN_FEATURES", "CPU#USABILITY", "OS#CONNECTIVITY", "SOFTWARE#MISCELLANEOUS", "CPU#OPERATION_PERFORMANCE", "KEYBOARD#USABILITY", "PORTS#USABILITY", "KEYBOARD#QUALITY", "HARD_DISC#QUALITY", "MULTIMEDIA_DEVICES#CONNECTIVITY", "SOFTWARE#OPERATION_PERFORMANCE", "MEMORY#USABILITY", "PHONE#CONNECTIVITY", "DISPLAY#OPERATION_PERFORMANCE", "PHONE#DESIGN_FEATURES", "KEYBOARD#OPERATION_PERFORMANCE", "HARDWARE#OPERATION_PERFORMANCE", "POWER_SUPPLY#CONNECTIVITY", "PHONE#USABILITY", "OS#QUALITY", "BATTERY#OPERATION_PERFORMANCE", "HARDWARE#CONNECTIVITY", "POWER_SUPPLY#QUALITY", "HARD_DISC#OPERATION_PERFORMANCE", "SUPPORT#QUALITY", "PHONE#OPERATION_PERFORMANCE", "CPU#GENERAL", "SUPPORT#USABILITY", "DISPLAY#QUALITY", "OS#DESIGN_FEATURES", "POWER_SUPPLY#USABILITY", "HARDWARE#DESIGN_FEATURES", "CPU#QUALITY", "PHONE#MISCELLANEOUS", "SOFTWARE#QUALITY", "OS#OPERATION_PERFORMANCE", "WARRANTY#OPERATION_PERFORMANCE", "PHONE#GENERAL", "PHONE#PRICE", "MULTIMEDIA_DEVICES#GENERAL", "PORTS#OPERATION_PERFORMANCE", "POWER_SUPPLY#GENERAL", "KEYBOARD#DESIGN_FEATURES", "MEMORY#QUALITY", "SOFTWARE#USABILITY", "DISPLAY#DESIGN_FEATURES", "BATTERY#QUALITY", "PORTS#CONNECTIVITY", "PORTS#QUALITY", "HARDWARE#GENERAL", "OS#USABILITY", "SOFTWARE#GENERAL", "DISPLAY#USABILITY", "DISPLAY#GENERAL", "MULTIMEDIA_DEVICES#DESIGN_FEATURES", "BATTERY#DESIGN_FEATURES", "OTHERS", "SOFTWARE#CONNECTIVITY", "SOFTWARE#DESIGN_FEATURES"]}
================================================
FILE: data/polarity/polarity_docu.xlsx
================================================
[File too large to display: 20.9 MB]
================================================
FILE: polarity_predict.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import os
from sklearn.externals import joblib
from utils.utils import delimiter
from utils.grammar import chinese_only
from utils.data_process import seg_words, gen_text_vec
class PolarityClassifier(object):
"""
text classification
"""
def __init__(self):
# config
model_name = 'polarity_doc' # doc-based
# path
path_delimiter = delimiter()
if 'absa' in os.path.abspath('.').split(path_delimiter):
path_absa = os.path.abspath('.')
else:
# 被调用路径=path_comment
path_absa = os.path.abspath('.') + path_delimiter + 'train' \
+ path_delimiter + 'sentiment' + path_delimiter + 'absa'
# model path
path_model_dir = path_absa + path_delimiter + 'model'
# load tokenizer
path_tokenizer = path_model_dir + path_delimiter + '{}.tk'.format(model_name)
self.tokenizer = joblib.load(path_tokenizer)
# load model
path_model = path_model_dir + path_delimiter + '{}.mdl'.format(model_name)
self.model = joblib.load(path_model)
self.model._make_predict_function()
def predict(self, comment):
# 1. chinese only
cmt = chinese_only([comment])
# 2. jieba token
cmt = seg_words(cmt)[0]
# 3. gen word vector
_cmt = gen_text_vec(self.tokenizer, cmt, maxlen = 200)
# token observation
# split_tokens = []
# for token in str(_cmt).split(" "):
# if token.isdigit():
# split_tokens.append(token)
# print("len(split_tokens):{}".format(len(split_tokens)))
# 4. predict
neg_prob = self.model.predict(_cmt)[0][0]
# neg_prob = (neg_prob > 0.5)
# 5. json result output
result = {'items':[{'negative_prob': 0,'sentiment': 0}], 'log_id': '', 'text': ''}
result['items'][0]['negative_prob'] = neg_prob
result['items'][0]['sentiment'] = int(round(neg_prob)) # 1表示差评;0表示好评
result['text'] = comment
print("SENTIMENT RESULT: ",result)
return result
if __name__=="__main__":
t = PolarityClassifier()
t.predict('这块电池好看')
================================================
FILE: train/aspect_classifier.py
================================================
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import os
import ast
import json
import pandas as pd
import numpy as np
from sklearn.externals import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from train.model.model import TextClassifier
from utils.utils import delimiter
from utils.data_process import nan_to_others,category_transpose,seg_words,load_aspect_list
class AspectClassifier(object):
"""
Aspect(=EA) Classifier Train Part
"""
def __init__(self):
path_delimiter = delimiter()
path_absa = os.path.abspath('..')
# config
task_tag = 'aspect_'
model_name = task_tag + 'svc'
# config path
self.path_config = path_absa + path_delimiter + 'config.json'
# model path
self.model_path = path_absa + path_delimiter + 'model' + path_delimiter + '{}.mdl'.format(model_name)
# data path
self.path_data = path_absa + path_delimiter +'data'
self.path_data_ch = path_absa + path_delimiter +'data' + path_delimiter + 'chinese' + path_delimiter
self.path_train_df = self.path_data + path_delimiter + 'aspect' + path_delimiter + '{}_train.xlsx'.format(model_name)
self.path_test_df = self.path_data + path_delimiter + 'aspect' + path_delimiter + '{}_test.xlsx'.format(model_name)
def data_process(self):
if os.path.isfile(self.path_train_df) \
and os.path.isfile(self.path_test_df) \
and os.path.isfile(self.path_config):
train_df = pd.read_excel(self.path_train_df)
test_df = pd.read_excel(self.path_test_df)
self.category_list = load_aspect_list(self.path_config)
else:
# 1. load data
train = pd.read_excel(self.path_data_ch+'Chinese_phones_training.xlsx')
test = pd.read_excel(self.path_data_ch+'CH_PHNS_SB1_TEST.xlsx')
# 2. mark NaN as 'OTHERS'
_data = []
for data in [train, test]:
df = nan_to_others(data)
_data.append(df)
# 3. generate category list
self.category_list = list(set(_data[0]['category'])) # len = 73
# 4. save category list to config
cate_dict = {'aspect_list':self.category_list}
with open(self.path_config, "w") as f:
f.write(json.dumps(cate_dict))
f.close()
# 5. generate df by category transpose
all_data = []
for d in _data:
df = category_transpose(d, self.category_list)
all_data.append(df)
# 6. save data
train_df, test_df = all_data[0], all_data[1]
train_df.to_excel(self.path_train_df, index=False)
test_df.to_excel(self.path_test_df, index=False)
return train_df, test_df
def train(self, train_df):
content_train = seg_words(train_df['text'])
vectorizer_tfidf = TfidfVectorizer(analyzer='word', ngram_range=(1, 5), min_df=5, norm='l2')
vectorizer_tfidf.fit(content_train)
# model train
classifier_dict = dict()
for column in self.category_list:
print(column)
label_train = train_df[column]
text_classifier = TextClassifier(vectorizer=vectorizer_tfidf)
text_classifier.fit(content_train, label_train)
classifier_dict[column] = text_classifier
# save model
if os.path.isfile(self.model_path):
pass
else:
joblib.dump(classifier_dict, self.model_path)
def test(self, test_df):
classifier = joblib.load(self.model_path)
content_test = seg_words(test_df['text'])
f1_score_dict = dict()
for column in self.category_list:
label_validate = test_df[column]
text_classifier = classifier[column]
f1_score = text_classifier.get_f1_score(content_test, label_validate)
f1_score_dict[column] = f1_score
f1_score = np.mean(list(f1_score_dict.values()))
print('F1-SCORE-DICT: ', f1_score_dict)
print('MEAN OF F1-SCORE-DICT: ', f1_score)
return f1_score_dict
if __name__=="__main__":
aspect = AspectClassifier()
train_df, test_df = aspect.data_process()
aspect.train(train_df)
aspect.test(test_df)
================================================
FILE: train/model/bilstm.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
from keras.models import Sequential
from keras.layers import Dense, LSTM, Embedding, Dropout,Bidirectional, GlobalMaxPool1D
class BiLSTM():
def __init__(self, max_features, embed_size):
model = Sequential()
model.add(Embedding(max_features, embed_size))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(GlobalMaxPool1D())
model.add(Dense(20, activation="relu"))
model.add(Dropout(0.05))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
self.classifier = model
def fit(self, x, y, batch_size, epochs, validation_split):
self.classifier.fit(x,y, batch_size=batch_size, epochs=epochs, validation_split=0.2)
def predict(self, x):
return self.classifier.predict(x)
def evaluate(self, y_true, y_pred):
acc = accuracy_score(y_pred, y_true)
f1 = f1_score(y_pred, y_true)
cfs_matrix = confusion_matrix(y_pred, y_true)
print('Accuracy Score:', acc)
print('F1-score: {0}'.format(f1))
print('Confusion matrix:\n', cfs_matrix)
return acc, f1, cfs_matrix
def _make_predict_function(self):
self.classifier._make_predict_function()
================================================
FILE: train/model/model.py
================================================
#!/user/bin/env python
# -*- coding:utf-8 -*-
__author__ = 'ZhangYi'
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import f1_score
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] <%(processName)s> (%(threadName)s) %(message)s')
logger = logging.getLogger(__name__)
class TextClassifier():
def __init__(self, vectorizer, classifier=MultinomialNB()):
classifier = SVC(kernel="rbf")
# classifier = SVC(kernel="linear")
self.classifier = classifier
self.vectorizer = vectorizer
def features(self, x):
return self.vectorizer.transform(x)
def fit(self, x, y):
self.classifier.fit(self.features(x), y)
def predict(self, x):
return self.classifier.predict(self.features(x))
def score(self, x, y):
return self.classifier.score(self.features(x), y)
def get_f1_score(self, x, y):
return f1_score(y, self.predict(x), average='macro')
================================================
FILE: train/polarity_classifier.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import os
import pandas as pd
from sklearn.externals import joblib
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from train.model.bilstm import BiLSTM
from utils.utils import delimiter
from utils.grammar import chinese_only
from utils.data_process import merge_excel,seg_words,remove_empty_row,gen_text_vec
class PolarityClassifier(object):
"""
train sentiment model and generate model file
"""
def __init__(self):
path_delimiter = delimiter()
path_absa = os.path.abspath('..')
# config
self.maxlen = 200 # doc word length
task_tag = 'polarity_'
model_name = task_tag + 'docu'
# model path
path_model = path_absa + path_delimiter + 'model'
self.model_path = path_model + path_delimiter + '{}.mdl'.format(model_name)
self.path_tokenizer = path_model + path_delimiter + '{}.tk'.format(model_name)
# data path
path_data_doc_level = path_delimiter.join(path_absa.split(path_delimiter)[:-2]) + path_delimiter + "data" \
+ path_delimiter + 'sentiment' + path_delimiter + 'document_level'
self.path_train_data = path_data_doc_level + path_delimiter + 'train_data'
self.path_data = path_absa + path_delimiter + 'data'
self.path_corpus = self.path_data + path_delimiter + 'polarity' + path_delimiter + '{}.xlsx'.format(model_name)
# generate tokenizer
self.data = self.data_process()
self.tokenizer = self.gen_tokenizer(self.data['cmt_split'])
def data_process(self):
if os.path.isfile(self.path_corpus):
data = pd.read_excel(self.path_corpus)
else:
# 1. merge data
data = merge_excel(self.path_train_data)
# 2. Chinese character only
data['cmt_zh'] = chinese_only(data['comment_content'])
# 3. jieba token for dictionary
data['cmt_split'] = seg_words(data['cmt_zh'])
# 4. remove empty comment
data = remove_empty_row(data, 'cmt_split')
# 5. save data
data.to_excel(self.path_corpus)
return data
def gen_tokenizer(self, cut_corpus_list):
if os.path.isfile(self.path_tokenizer):
tokenizer = joblib.load(self.path_tokenizer)
else:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(cut_corpus_list.astype(str))
joblib.dump(tokenizer, self.path_tokenizer)
return tokenizer
def gen_train_test(self, x, y):
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)
return X_train, X_test, y_train, y_test
def train(self, X_train, y_train):
embed_size = 256
max_features = 66000 # dictionary size
classifier = BiLSTM(max_features, embed_size)
epochs = 2
batch_size = 100
X_tr = gen_text_vec(self.tokenizer, X_train, self.maxlen)
classifier.fit(X_tr, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)
# save model
if os.path.isfile(self.model_path):
print('model exist already')
else:
joblib.dump(classifier, self.model_path)
def test(self, X_test, y_test):
X_te = gen_text_vec(self.tokenizer, X_test, self.maxlen)
# load model
model = joblib.load(self.model_path)
pred_prob = model.predict(X_te)
# pred = (pred_prob > 0.65)
pred = [int(round(i[0])) for i in pred_prob]
y_test = [int(i) for i in y_test]
# evaluate
eval = model.evaluate(y_test, pred)
return eval
def batch_predict(self, batch_cmt_df):
# 1. chinese only
batch_cmt_df['cmt_zh'] = chinese_only(batch_cmt_df['comment_content'])
# 2. token cut
batch_cmt_df['cmt_split'] = seg_words(batch_cmt_df['cmt_zh'])
# 暂时没有remove empty环节
# 3. predict
self.test(batch_cmt_df['cmt_split'],batch_cmt_df['label'])
# # save result
# result = pd.DataFrame(np.array([self.X_test,self.y_test,pred]).T,columns=['comment_zh','GroundTruth','bilstm'])
# result.to_excel('data/sentiment/result_.xlsx')
if __name__=="__main__":
pc = PolarityClassifier()
data = pc.data
X_train, X_test, y_train, y_test = pc.gen_train_test(data['cmt_split'], data['label'])
pc.train(X_train, y_train)
pc.test(X_test, y_test)
================================================
FILE: utils/__init__.py
================================================
================================================
FILE: utils/baidu_tagging.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from aip import AipNlp
import pandas as pd
import time
""" 你的 APPID AK SK """
APP_ID = '155934'
API_KEY = 'PBW2w1dveS7x3YcKSZW0V7'
SECRET_KEY = 'AOE75EWZqeI6kM7Kesq8i6FzQruDI'
client = AipNlp(APP_ID, API_KEY, SECRET_KEY)
# 请求文件
source_file = "请求文件路径"
source_df = pd.read_excel(source_file)
comments = []
neg_probs = []
pos_probs = []
confidences = []
sentiments = []
complete_count = 0
# 请求错误统计
err_count = 0
err_comment = []
start_time = time.time()
# 循环请求
i = 0
while i < len(source_df):
comment = source_df["comment_content"][i]
try:
query_result = client.sentimentClassify(comment[:1024])
except Exception as e:
print("query_result:{}".format(query_result))
print("#######请求过程存在问题#######")
err_count += 1
err_comment.append(comment)
i += 1
continue
try:
result = query_result['items'][0]
neg_prob = result['negative_prob']
pos_prob = result['positive_prob']
confidence = result['confidence']
sentiment = result['sentiment']
except KeyError as e:
print("#######请求QPS限制#######")
print("i={}".format(i))
continue
i += 1
comments.append(comment)
neg_probs.append(neg_prob)
pos_probs.append(pos_prob)
confidences.append(confidence)
sentiments.append(sentiment)
complete_count += 1
print("总共:{}条".format(len(source_df)))
print("请求完成: {}条".format(complete_count))
print("完成进度:{}%".format(round(complete_count / len(source_df) * 100, 2)))
cost_mins = (time.time() - start_time) / 60
print("累计用时:{}分钟".format(round(cost_mins, 2)))
avg_query_time = complete_count / cost_mins
# print("每条请求平均用时:{}".format(avg_query_time))
left_mins = (len(source_df) - complete_count - err_count) / avg_query_time
print("预计还需:{}分钟".format(round(left_mins, 2)))
print("\n")
print("所有请求完成!")
print("请求总数量:{}".format(len(source_df)))
print("请求过程中存在问题的数量:{}".format(err_count))
# 保存结果
# 请求成功的结果保存
desti_df = pd.DataFrame()
desti_df["comment"] = comments
desti_df["neg_probs"] = neg_probs
desti_df["pos_probs"] = pos_probs
desti_df["confidences"] = confidences
desti_df["sentiments"] = sentiments
desti_file = "请求结果保存路径"
desti_df.to_excel(desti_file, engine='xlsxwriter')
# 请求失败的结果保存
err_df = pd.DataFrame()
err_file = "请求结果报错保存路径"
err_df["comment"] = err_comment
err_df.to_excel(err_file, engine='xlsxwriter') # 如果请求接口里有奇怪字符,保存文件时就使用, engine='xlsxwriter'
================================================
FILE: utils/data_process.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import ast
import jieba
import itertools
import pandas as pd
import numpy as np
from keras.preprocessing.sequence import pad_sequences
# mark NaN as 'OTHERS'
def nan_to_others(df):
new_cate = []
new_polarity = []
# dataframe必须含有列:['text', 'category', 'polarity']
for idx, i in enumerate(df['polarity']):
if i in ['negative', 'positive', 'neutral', 'conflict']:
new_cate.append(df['category'][idx])
new_polarity.append(i)
else:
new_cate.append('OTHERS')
new_polarity.append('OTHERS')
_df = pd.DataFrame(np.array([df['text'], new_cate, new_polarity]).T, columns=['text', 'category', 'polarity'])
return _df
# tokenize
def seg_words(contents):
contents_segs = list()
for content in contents:
segs = jieba.lcut(content)
contents_segs.append(" ".join(segs))
return contents_segs
# get text vector
def gen_text_vec(tokenizer, cut_corpus_list, maxlen):
text_vec = tokenizer.texts_to_sequences(cut_corpus_list)
t_vec = pad_sequences(text_vec, maxlen=maxlen)
return t_vec
# category transpose
def category_transpose(df, category_list):
for i in category_list:
l_ist = []
# dataframe必须含有列:['category']
for cate in df['category']:
if cate == i:
l_ist.append(1)
else:
l_ist.append(0)
df[i] = l_ist
return df
# load config: aspect_list
def load_aspect_list(path_config):
# only one param in config: aspect_list
a = 0
with open(path_config, "r", encoding='utf-8') as f:
for i in f:
category_list = ast.literal_eval(i)['aspect_list']
a = a + 1
if a == 1:
break
f.close()
return category_list
# merge excel
def merge_excel(path_data_dir):
cmt_l = []
scr_l = []
# 被merge的df都必须有['comment_content', 'label']
data_source = ['/2019-04-12_lock_comment_jd_spider_baidu_sentiment.xlsx', \
'/20190329_train_lock_comments_document_level_with_label.xls', \
'/all_comments_document_level_without_lock_comments.xls', \
'/bad_comments_in_forum_mi.com_youpin.xls']
for i in data_source:
path_data = path_data_dir + i
_data = pd.read_excel(path_data)
cmt_l.append(_data['comment_content'])
scr_l.append(_data['label'])
comment = list(itertools.chain.from_iterable(cmt_l))
score = list(itertools.chain.from_iterable(scr_l))
data = pd.DataFrame(np.array([comment, score]).T, columns=['comment_content', 'label'])
return data
# remove row by column with empty value
def remove_empty_row(df, column_name):
row_to_delete = []
for idx, i in enumerate(df[column_name]):
if not bool(i):
row_to_delete.append(idx)
df = df.drop(df.index[row_to_delete])
return df.reset_index(drop=True)
================================================
FILE: utils/grammar.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import re
def chinese_only(txt_list):
cmt_zh = []
for cmt in txt_list:
line = cmt.strip()
p2 = re.compile(u'[^\u4e00-\u9fa5]')
zh = " ".join(p2.split(line)).strip()
cmt_zh.append(",".join(zh.split()))
return cmt_zh
================================================
FILE: utils/utils.py
================================================
# -*- coding: utf-8 -*-
__author__ = 'ZhangYi'
import sys
def delimiter():
path_delimiter = '/'
if 'win' in sys.platform:
path_delimiter = '\\'
return path_delimiter
gitextract_rxbagi6o/
├── README.md
├── ai_challenge_sentiment/
│ ├── code/
│ │ └── sentiment_analysis2018_baseline/
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── data_process.py
│ │ ├── main_predict.py
│ │ ├── main_train.py
│ │ ├── model.py
│ │ └── requirements.txt
│ ├── model.py
│ └── train.py
├── aspect_predict.py
├── config.json
├── data/
│ ├── aspect/
│ │ ├── aspect_svc_test.xlsx
│ │ └── aspect_svc_train.xlsx
│ ├── chinese/
│ │ ├── CH_CAME_SB1_TEST.xlsx
│ │ ├── CH_PHNS_SB1_TEST.xlsx
│ │ ├── Chinese_phones_training.xlsx
│ │ └── camera_training.xlsx
│ └── polarity/
│ └── polarity_docu.xlsx
├── polarity_predict.py
├── train/
│ ├── aspect_classifier.py
│ ├── model/
│ │ ├── bilstm.py
│ │ └── model.py
│ └── polarity_classifier.py
└── utils/
├── __init__.py
├── baidu_tagging.py
├── data_process.py
├── grammar.py
└── utils.py
SYMBOL INDEX (58 symbols across 13 files)
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/data_process.py
function load_data_from_csv (line 9) | def load_data_from_csv(file_name, header=0, encoding="utf-8"):
function seg_words (line 17) | def seg_words(contents):
FILE: ai_challenge_sentiment/code/sentiment_analysis2018_baseline/model.py
class TextClassifier (line 13) | class TextClassifier():
method __init__ (line 15) | def __init__(self, vectorizer, classifier=MultinomialNB()):
method features (line 21) | def features(self, x):
method fit (line 24) | def fit(self, x, y):
method predict (line 28) | def predict(self, x):
method score (line 32) | def score(self, x, y):
method get_f1_score (line 35) | def get_f1_score(self, x, y):
FILE: ai_challenge_sentiment/model.py
class TextClassifier (line 13) | class TextClassifier():
method __init__ (line 15) | def __init__(self, vectorizer, classifier=MultinomialNB()):
method features (line 21) | def features(self, x):
method fit (line 24) | def fit(self, x, y):
method predict (line 28) | def predict(self, x):
method score (line 32) | def score(self, x, y):
method get_f1_score (line 35) | def get_f1_score(self, x, y):
FILE: ai_challenge_sentiment/train.py
function seg_words (line 20) | def seg_words(contents):
FILE: aspect_predict.py
class AspectPredict (line 11) | class AspectPredict(object):
method __init__ (line 12) | def __init__(self):
method predict (line 28) | def predict(self, text):
FILE: polarity_predict.py
class PolarityClassifier (line 13) | class PolarityClassifier(object):
method __init__ (line 17) | def __init__(self):
method predict (line 42) | def predict(self, comment):
FILE: train/aspect_classifier.py
class AspectClassifier (line 16) | class AspectClassifier(object):
method __init__ (line 20) | def __init__(self):
method data_process (line 40) | def data_process(self):
method train (line 82) | def train(self, train_df):
method test (line 102) | def test(self, test_df):
FILE: train/model/bilstm.py
class BiLSTM (line 11) | class BiLSTM():
method __init__ (line 12) | def __init__(self, max_features, embed_size):
method fit (line 24) | def fit(self, x, y, batch_size, epochs, validation_split):
method predict (line 27) | def predict(self, x):
method evaluate (line 30) | def evaluate(self, y_true, y_pred):
method _make_predict_function (line 40) | def _make_predict_function(self):
FILE: train/model/model.py
class TextClassifier (line 14) | class TextClassifier():
method __init__ (line 16) | def __init__(self, vectorizer, classifier=MultinomialNB()):
method features (line 22) | def features(self, x):
method fit (line 25) | def fit(self, x, y):
method predict (line 29) | def predict(self, x):
method score (line 33) | def score(self, x, y):
method get_f1_score (line 36) | def get_f1_score(self, x, y):
FILE: train/polarity_classifier.py
class PolarityClassifier (line 18) | class PolarityClassifier(object):
method __init__ (line 22) | def __init__(self):
method data_process (line 48) | def data_process(self):
method gen_tokenizer (line 69) | def gen_tokenizer(self, cut_corpus_list):
method gen_train_test (line 78) | def gen_train_test(self, x, y):
method train (line 82) | def train(self, X_train, y_train):
method test (line 98) | def test(self, X_test, y_test):
method batch_predict (line 114) | def batch_predict(self, batch_cmt_df):
FILE: utils/data_process.py
function nan_to_others (line 14) | def nan_to_others(df):
function seg_words (line 30) | def seg_words(contents):
function gen_text_vec (line 38) | def gen_text_vec(tokenizer, cut_corpus_list, maxlen):
function category_transpose (line 44) | def category_transpose(df, category_list):
function load_aspect_list (line 57) | def load_aspect_list(path_config):
function merge_excel (line 70) | def merge_excel(path_data_dir):
function remove_empty_row (line 93) | def remove_empty_row(df, column_name):
FILE: utils/grammar.py
function chinese_only (line 8) | def chinese_only(txt_list):
FILE: utils/utils.py
function delimiter (line 7) | def delimiter():
Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (45K chars).
[
{
"path": "README.md",
"chars": 6509,
"preview": "# ABSA\n\nAspect Based Sentiment Analysis\n\n虽说是基于观点的分析,但也是基于句子层的分析,因为需要按句子进行分析。\n\n:\n cmt_zh"
},
{
"path": "utils/utils.py",
"chars": 188,
"preview": "# -*- coding: utf-8 -*-\n\n__author__ = 'ZhangYi'\n\nimport sys\n\ndef delimiter():\n path_delimiter = '/'\n if 'win' in s"
}
]
// ... and 8 more files (download for full content)
About this extraction
This page contains the full source code of the YZHANG1270/Aspect-Based-Sentiment-Analysis GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (20.9 MB), approximately 11.5k tokens, and a symbol index with 58 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.