python实现线性回归的示例代码
作者:von?Libniz 发布时间:2021-04-26 14:54:47
标签:python,线性回归
1线性回归
1.1简单线性回归
在简单线性回归中,通过调整a和b的参数值,来拟合从x到y的线性关系。下图为进行拟合所需要优化的目标,也即是MES(Mean Squared Error),只不过省略了平均的部分(除以m)。
对于简单线性回归,只有两个参数a和b,通过对MSE优化目标求极值(最小二乘法),即可求得最优a和b如下,所以在训练简单线性回归模型时,也只需要根据数据求解这两个参数值即可。
下面使用波士顿房价数据集中,索引为5的特征RM (average number of rooms per dwelling)来进行简单线性回归。其中使用的评价指标为:
# 以sklearn的形式对simple linear regression 算法进行封装
import numpy as np
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error,mean_absolute_error
np.random.seed(123)
class SimpleLinearRegression():
def __init__(self):
"""
initialize model parameters
self.a_=None
self.b_=None
def fit(self,x_train,y_train):
training model parameters
Parameters
----------
x_train:train x ,shape:data [N,]
y_train:train y ,shape:data [N,]
assert (x_train.ndim==1 and y_train.ndim==1),\
"""Simple Linear Regression model can only solve single feature training data"""
assert len(x_train)==len(y_train),\
"""the size of x_train must be equal to y_train"""
x_mean=np.mean(x_train)
y_mean=np.mean(y_train)
self.a_=np.vdot((x_train-x_mean),(y_train-y_mean))/np.vdot((x_train-x_mean),(x_train-x_mean))
self.b_=y_mean-self.a_*x_mean
def predict(self,input_x):
make predictions based on a batch of data
input_x:shape->[N,]
assert input_x.ndim==1 ,\
"""Simple Linear Regression model can only solve single feature data"""
return np.array([self.pred_(x) for x in input_x])
def pred_(self,x):
give a prediction based on single input x
return self.a_*x+self.b_
def __repr__(self):
return "SimpleLinearRegressionModel"
if __name__ == '__main__':
boston_data = datasets.load_boston()
x = boston_data['data'][:, 5] # total x data (506,)
y = boston_data['target'] # total y data (506,)
# keep data with target value less than 50.
x = x[y < 50] # total x data (490,)
y = y[y < 50] # total x data (490,)
plt.scatter(x, y)
plt.show()
# train size:(343,) test size:(147,)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
regs = SimpleLinearRegression()
regs.fit(x_train, y_train)
y_hat = regs.predict(x_test)
rmse = np.sqrt(np.sum((y_hat - y_test) ** 2) / len(x_test))
mse = mean_squared_error(y_test, y_hat)
mae = mean_absolute_error(y_test, y_hat)
# notice
R_squared_Error = 1 - mse / np.var(y_test)
print('mean squared error:%.2f' % (mse))
print('root mean squared error:%.2f' % (rmse))
print('mean absolute error:%.2f' % (mae))
print('R squared Error:%.2f' % (R_squared_Error))
输出结果:
mean squared error:26.74
root mean squared error:5.17
mean absolute error:3.85
R squared Error:0.50
数据的可视化:
1.2 多元线性回归
多元线性回归中,单个x的样本拥有了多个特征,也就是上图中带下标的x。
其结构可以用向量乘法表示出来:
为了便于计算,一般会将x增加一个为1的特征,方便与截距bias计算。
而多元线性回归的优化目标与简单线性回归一致。
通过矩阵求导计算,可以得到方程解,但求解的时间复杂度很高。
下面使用正规方程解的形式,来对波士顿房价的所有特征做多元线性回归。
import numpy as np
from PlayML.metrics import r2_score
from sklearn.model_selection import train_test_split
import sklearn.datasets as datasets
from PlayML.metrics import root_mean_squared_error
np.random.seed(123)
class LinearRegression():
def __init__(self):
self.coef_=None # coeffient
self.intercept_=None # interception
self.theta_=None
def fit_normal(self, x_train, y_train):
"""
use normal equation solution for multiple linear regresion as model parameters
Parameters
----------
theta=(X^T * X)^-1 * X^T * y
assert x_train.shape[0] == y_train.shape[0],\
"""size of the x_train must be equal to y_train """
X_b=np.hstack([np.ones((len(x_train), 1)), x_train])
self.theta_=np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train) # (featere,1)
self.coef_=self.theta_[1:]
self.intercept_=self.theta_[0]
def predict(self,x_pred):
"""给定待预测数据集X_predict,返回表示X_predict的结果向量"""
assert self.intercept_ is not None and self.coef_ is not None, \
"must fit before predict!"
assert x_pred.shape[1] == len(self.coef_), \
"the feature number of X_predict must be equal to X_train"
X_b=np.hstack([np.ones((len(x_pred),1)),x_pred])
return X_b.dot(self.theta_)
def score(self,x_test,y_test):
Calculate evaluating indicator socre
---------
x_test:x test data
y_test:true label y for x test data
y_pred=self.predict(x_test)
return r2_score(y_test,y_pred)
def __repr__(self):
return "LinearRegression"
if __name__ == '__main__':
# use boston house price dataset for test
boston_data = datasets.load_boston()
x = boston_data['data'] # total x data (506,)
y = boston_data['target'] # total y data (506,)
# keep data with target value less than 50.
x = x[y < 50] # total x data (490,)
y = y[y < 50] # total x data (490,)
# train size:(343,) test size:(147,)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,random_state=123)
regs = LinearRegression()
regs.fit_normal(x_train, y_train)
# calc error
score=regs.score(x_test,y_test)
rmse=root_mean_squared_error(y_test,regs.predict(x_test))
print('R squared error:%.2f' % (score))
print('Root mean squared error:%.2f' % (rmse))
输出结果:
R squared error:0.79
Root mean squared error:3.36
1.3 使用sklearn中的线性回归模型
import sklearn.datasets as datasets
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.model_selection import train_test_split
from PlayML.metrics import root_mean_squared_error
np.random.seed(123)
if __name__ == '__main__':
# use boston house price dataset
boston_data = datasets.load_boston()
x = boston_data['data'] # total x size (506,)
y = boston_data['target'] # total y size (506,)
# keep data with target value less than 50.
x = x[y < 50] # total x size (490,)
y = y[y < 50] # total x size (490,)
# train size:(343,) test size:(147,)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=123)
regs = LinearRegression()
regs.fit(x_train, y_train)
# calc error
score = regs.score(x_test, y_test)
rmse = root_mean_squared_error(y_test, regs.predict(x_test))
print('R squared error:%.2f' % (score))
print('Root mean squared error:%.2f' % (rmse))
print('coeffient:',regs.coef_.shape)
print('interception:',regs.intercept_.shape)
R squared error:0.79
Root mean squared error:3.36
coeffient: (13,)
interception: ()
来源:https://blog.csdn.net/Demon_LMMan/article/details/123114890
0
投稿
猜你喜欢
- 本文实例讲述了Python使用import导入本地脚本及导入模块的技巧。分享给大家供大家参考,具体如下:导入本地脚本import 如果你要导
- 第三章 XML的术语提纲:导言 一.XML文档的有关术语 二.DTD的有关术语导言初学XML最令人头疼的就是有一大堆新的术语概念要理解。由于
- 从最简单的Web浏览器的登录界面开始,登录界面如下:进行Web页面自动化测试,对页面上的元素进行定位和操作是核心。而操作又是以定位为前提的,
- 本文实例讲述了Python获取DLL和EXE文件版本号的方法。分享给大家供大家参考。具体实现方法如下:import win32apidef
- 首先添加一个splice函数:splice:该方法的作用就是从数组中删除一个元素array.splice(index,count,value
- VBScript似乎已经成为ASP服务器端开发的首先语言,VBScript函数库丰富、而且使用起来也很容易上手,即使平时不太编程的朋友,只要
- 单位内部网站第三次修改,即将进入尾声,遇到一个怪现象,就是在自定义标签中,加入链接会被替换掉成这样的格式{$GetInstallDir}ad
- 突发奇想,写了以下这段代码,感觉还不错,拿来和大家分享作用:查看页面布局使用方法:在页面底部包含以下这段代码ff3,ie7测试可用var&n
- PHP number_format() 函数实例格式化数字:<?php echo number_format("100000
- 原文地址:http://ilovetypography.com/2007/10/22/so-you-want-to-create-a-fon
- 每月需更新某个excel表格,进行两项操作,且不覆盖原有的sheet:1. 在原来的excel表中新增sheet2. 往原有的excel表中
- <?php function BigEndian2Int($byte_word, $signed = false) { $int_va
- 掩码数组数据很大形况下是凌乱的,并且含有空白的或者无法处理的字符,掩码式数组可以很好的忽略残缺的或者是无效的数据点。掩码式数组由一个正常数组
- 上节我们了解了图形验证码的识别,简单的图形验证码我们可以直接利用 Tesserocr 来识别,但是近几年又出现了一些新型验证码,如滑动验证码
- 一、安装FastDFS1-1:执行docker命令安装# 安装trackerdocker run -dti --network=host -
- 最近在做FLY量化交易系统的维护,总有客户说策略执行好慢,结果有些人展示策略一看,每个语句后边都要打印下数据。哪些数据都是辅助用的,打印出来
- 简介pandas中的DF数据类型可以像数据库表格一样进行groupby操作。通常来说groupby操作可以分为三部分:分割数据,应用变换和和
- 该 GIF 图来自于官网,文末有给出链接。描述依托于百度网盘巨大的的云存储空间,绝大数人会习惯性的将一些资料什么的存储到上面,但是有的私密链
- 本文实例讲述了Python实现的合并两个有序数组算法。分享给大家供大家参考,具体如下:思路按位循环比较两个数组,较小元素的放入新数组,下标加
- 在进行增强现实的时候我们需要用到两个工具包:PyGame 和 PyOpenGL,本章在python环境下对这两个工具包的安装进行说明。一、安