首页 技术 正文
技术 2022年11月10日
0 收藏 311 点赞 4,009 浏览 2759 个字

重塑和轴向旋转

Se

import pandas as pd
import numpy as np
from pandas import Seriesdata=pd.DataFrame(np.arange(6).reshape(2,3),
index=['Ohio','Colorado'],
columns=['one','two','three']
)
data.index.names=['state']
data.columns.names=['number']
datanumber onetwothree
state
Ohio 012
Colorado345# 使用该unstack()方法可将列转为行,一一对应,得到一个Series
result = data.stack()
resultstate number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32# unstack()可以将其重排为一个DataFrame
result.unstack()numberonetwothree
state
Ohio 012
Colorado345# 默认情况下,unstack操作的是最里面的那层number,这个对象可以接收索引的编号啊或名称
result.unstack('state')state Ohio Colorado
number
one 03
two 14
three 25
# 传入索引名称,都是讲state的层次化索引变为DataFrame格式
result.unstack(0)state OhioColorado
number
one 03
two 14
three 25 # 当组里的值不是都有的时候,unstack会引入缺失数
s1 = Series([0,1,2,3],index=['a','b','c','d'])
s2 = Series([4,5,6], index=['c','d','e'])
data2 = pd.concat([s1,s2],keys=['one','two'])
data2one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64data2.unstack() abcde
one0.01.02.03.0NaN
twoNaNNaN4.05.06.0# 但是stack却可以过滤掉缺失数据,如果不想过滤,可以dropna=False
data2.unstack().stack()
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64# 这是不过滤的效果
data2.unstack().stack(dropna=False)one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64# DataFrame中的stack和unstackresultstate number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32df = pd.DataFrame({'left':result, 'right':result+5},columns=pd.Index(['left','right'],name='side'))
df side left right
state number
Ohio one 0 5
two 1 6
three 2 7
Colorado one 3 8
two 4 9
three 5 10# 对DataFrame进行unstack操作,会将旋转轴变为结果中的最低级别,变为层次化索引的最低级别
df.unstack('state')sideleft right
stateOhioColoradoOhioColorado
number
one 0 3 5 8
two 1 4 6 9
three 2 5 7 10# side也会是最低级别,把side折叠
df.unstack('state').stack('side')state ColoradoOhio
numberside
one left3 0
right8 5
two left4 1
right9 6
threeleft5 2
right10 7

时间序列数据的堆叠格式

data_c = [
['1959-03-31','realgdb',2710.349],
['1959-03-31','infl',0.000],
['1959-03-31','unemp',5.800],
['1959-06-30','realgdb',2778.801],
['1959-06-30','infl',2.340],
['1959-06-30','unemp',5.100],
['1959-09-30','realgdb',2775.488],
['1959-09-30','infl',2.740],
['1959-09-30','unemp',5.300],
]
ldata = pd.DataFrame(data_c,columns=['data','item','value'])
ldata dataitemvalue
01959-03-31realgdb2710.349
11959-03-31infl0.000
21959-03-31unemp5.800
31959-06-30realgdb2778.801
41959-06-30infl2.340
51959-06-30unemp5.100
61959-09-30realgdb2775.488
71959-09-30infl2.740
81959-09-30unemp5.300# 将data作为行索引,item作为列索引,最简单的方法,pivot快捷函数
ldata.pivot('data','item','value')item inflrealgdb unemp
data
1959-03-310.002710.3495.8
1959-06-302.342778.8015.1
1959-09-302.742775.4885.3# pivot其实是执行了如下两步,本质还是堆叠
#第一步
ldata.set_index(['data','item']) value
dataitem
1959-03-31realgdb2710.349
infl0.000
unemp5.800
1959-06-30realgdb2778.801
infl2.340
unemp5.100
1959-09-30realgdb2775.488
infl2.740
unemp5.300# 第二步
ldata.set_index(['data','item']).unstack()value
item inflrealgdb unemp
data
1959-03-310.002710.3495.8
1959-06-302.342778.8015.1
1959-09-302.742775.4885.3
相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,154
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,623
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,465
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,239
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:7,874
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:5,042