首页技术正文

深度学习之NLP获取词向量

开发者玉宁技术 2022年11月20日

0 收藏 434 点赞 2,842 浏览 566 个字

1、代码

def clean_text(text, remove_stopwords=False):
    """
    数据清洗
    """
    text = BeautifulSoup(text, 'html.parser').get_text()
    text = re.sub(r'[^a-zA-Z]', ' ', text)
    words = text.lower().split()
    if remove_stopwords:
        words = [w for w in words if w not in eng_stopwords]
    return wordsdef to_review_vector(review):
    """
    获取词向量
    """
    global word_vec    review = clean_text(review, remove_stopwords=True)
    #print (review)
    #words = nltk.word_tokenize(review)
    word_vec = np.zeros((1,300))
    for word in review:
        #word_vec = np.zeros((1,300))
        if word in model:
            word_vec += np.array([model[word]])
    #print (word_vec.mean(axis = 0))
    return pd.Series(word_vec.mean(axis = 0))

点赞 434

split za 代码向量数据

开发者玉宁

贡献者

上一篇： NLP之词向量

下一篇： JavaScript高级程序设计（第三版）学习笔记22、24、25章

相关推荐

python开发_常用的python模块及安装方法

python开发_常用的python模块及安装方法

adodb：我们领导推荐的数据库连接组件bsddb3：BerkeleyDB的连接组件Cheetah-1.0：我比较喜欢这个版本的cheeta…

程序员润宾技术

日期：2022-11-24 点赞：878 阅读：9,105

Educational Codeforces Round 11 C. Hard Process 二分

Educational Codeforces Round 11 C. Hard Process 二分

C. Hard Process题目连接：http://www.codeforces.com/contest/660/problem/CDes…

程序员春广技术

日期：2022-11-24 点赞：807 阅读：5,582

下载Ubuntn 17.04 内核源代码

下载Ubuntn 17.04 内核源代码

zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…

程序员峰军技术

日期：2022-11-24 点赞：569 阅读：6,429

可用Active Desktop Calendar V7.86 注册码序列号

可用Active Desktop Calendar V7.86 注册码序列号

可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…

程序员天赐技术

日期：2022-11-24 点赞：733 阅读：6,200

Android调用系统相机、自定义相机、处理大图片

Android调用系统相机、自定义相机、处理大图片

Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式，并且由于涉及到要把拍到的照片显…

程序员爱鹏技术

日期：2022-11-24 点赞：512 阅读：7,836

Struts的使用

一、Struts2的获取　　Struts的官方网站为：http://struts.apache.org/　　下载完Struts2的jar包,…

程序员红卫技术

日期：2022-11-24 点赞：671 阅读：4,919