用机器学习的方法理解社会媒体.ppt

上传人:本田雅阁 文档编号:3302921 上传时间:2019-08-10 格式:PPT 页数:96 大小:14.36MB
返回 下载 相关 举报
用机器学习的方法理解社会媒体.ppt_第1页
第1页 / 共96页
用机器学习的方法理解社会媒体.ppt_第2页
第2页 / 共96页
用机器学习的方法理解社会媒体.ppt_第3页
第3页 / 共96页
用机器学习的方法理解社会媒体.ppt_第4页
第4页 / 共96页
用机器学习的方法理解社会媒体.ppt_第5页
第5页 / 共96页
点击查看更多>>
资源描述

《用机器学习的方法理解社会媒体.ppt》由会员分享,可在线阅读,更多相关《用机器学习的方法理解社会媒体.ppt(96页珍藏版)》请在三一文库上搜索。

1、,Understanding Social Media with Machine Learning Xiaojin Zhu jerryzhucs.wisc.edu Department of Computer Sciences University of WisconsinMadison, USA CCF/ADL Beijing 2013,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,1 / 95,Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Soc

2、ial Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,2 / 95,Spatio-Temporal S

3、ignal Recovery from Social Media Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Z

4、hu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,3 / 95,Spatio-Temporal Signal Recovery from Social Media Spatio-temporal Signal: When, Where, How Much Direct instrumental sensing is di cult and expensive,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,4 / 95,Spatio

5、-Temporal Signal Recovery from Social Media Humans as Sensors,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,5 / 95,Spatio-Temporal Signal Recovery from Social Media Humans as Sensors Not “hot trend” discovery: We know what event we want to monitor Not natural language processing

6、for social media: We are given a reliable text classier for “hit” Our task: precisely estimating a spatiotemporal intensity function fst of a pre-dened target phenomenon.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,6 / 95,Spatio-Temporal Signal Recovery from Social Media Challe

7、nges of Using Humans as Sensors Keyword doesnt always mean event,I I,I was just told I look like dead crow. Dont blame me if one day I treat you like a dead crow.,Human sensors arent under our control Location stamps may be erroneous or missing,I I I I,3% have GPS coordinates: (-98.24, 23.22) 47% ha

8、ve valid user prole location: Bristol, UK, New York 50% dont have valid location information Hogwarts, In the tra cblah, Sitting On A Taco,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,7 / 95,Spatio-Temporal Signal Recovery from Social Media Problem Denition Input: A list of time

9、 and location stamps of the target posts. Output: fst Intensity of target phenomenon at location s (e.g., New York) and time t (e.g., 0-1am),Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,8 / 95,Spatio-Temporal Signal Recovery from Social Media Why Simple Estimation is Bad fst = x

10、st, the count of target posts in bin (s,t) Justication: MLE of the model x Poisson(f) However,I I I,Population Bias: Assume fst = fs0t0, if more users in (s,t), then xst xs0t0 Imprecise location: Posts without location stamp, noisy user prole location Zero/Low counts: If we dont see tweets from Anta

11、rctica, no penguins there?,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,9 / 95,Machine Learning Basics Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Sto

12、chastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,10 / 95,Machine Learning Basics,Probability,Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probabilit

13、y Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,11 / 95,Machine Learning Basics,Probability,Probability The probabil

14、ity of a discrete random variable A taking the value a is P(A = a) 2 0,1. Sometimes written as P(a) when no danger of confusion. Normalization Joint probability P(A = a,B = b) = P(a,b), the two events both happen at the same time. Marginalization P(A = a) = B”. P(a,b) The product rule P(a,b) = P(a)P

15、(b|a) = P(b)P(a|b).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,12 / 95,Bayes rule P(a|b) =,P(b|a)P(a).,In general, P(a|b,C) =,P(b|C),R,p(D|)p()d the evidence,Machine Learning Basics,Probability,Bayes Rule,P(b) P(b|a,C)P(a|C),where C can be one or more,random variables.,Bayesia

16、n approach: when is model parameter, D is observed data,we have,p(|D) =,p(D|)p() p(D),R,p(D|)d 6= 1),I I I I,p() is the prior, p(D|) the likelihood function (of , not normalized: p(D) = p(|D) the posterior.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,13 / 95,Machine Learning Ba

17、sics,Probability,Independence The product rule can be simplied as P(a,b) = P(a)P(b) i A and B are independent Equivalently, P(a|b) = P(a), P(b|a) = P(b).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,14 / 95,R x2,P(x1 X x2) =,Z 1,R 1,Machine Learning Basics,Probability,Probabilit

18、y density A continuous random variable x has a probability density function (pdf) p(x) 2 0,1. p(x) 1 is possible! Integrates to 1.,x1 Marginalization p(x) =,p(x)dx = 1 1 p(x)dx 1 p(x,y)dy,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,15 / 95,p,Machine Learning Basics,Probability,

19、Expectation and Variance The expectation (“mean” or “average”) of a function f under the probability distribution P is EPf = P(a)f(a) a Epf = p(x)f(x)dx x In particular if f(x) = x, this is the mean of the random variable x. The variance of f is,Var(f) = E(f(x),Ef(x)2 = Ef(x)2,Ef(x)2,The standard de

20、viation is std(f) =,Var(f).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,16 / 95,Machine Learning Basics,Probability,Multivariate Statistics When x,y are vectors, Ex is the mean vector Cov(x,y) is the covariance matrix with i,j-th entry being Cov(xi,yj).,Cov(x,y) = Ex,y(x,Ex)(y,

21、Ey) = Ex,yxy,ExEy,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,17 / 95,8 ,px(1,:,8 ,:,Qd,k=1 pk,Pd,Machine Learning Basics,Probability,Some Discrete Distributions a if P(X = a) = 1 Binomial. n (number of trials) and p (head probability),p)n x,for x = 0,1,.,n otherwise,n f(x) = x

22、 0 Bernoulli. Binomial with n = 1.,Multinomial p = (p1,.,pd) (d-sided die),f(x) =,n x1,.,xd,xk, 0,if k=1 xk = n otherwise,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,18 / 95,Machine Learning Basics,Probability,More Discrete Distributions Poisson. X Poisson( ) if x,x!,f(x) = e f

23、or x = 0,1,2, the rate or intensity parameter,mean:, variance:,2) then X1 + X2 Poisson( 1 + 2). This is a distribution on unbounded counts with a probability mass,function“hump” (mode at d e,1).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,19 / 95,Gaussian (Normal): X N(,Machine

24、 Learning Basics,Probability,Some Continuous Distributions,2),with parameters 2 R (the,mean) and 2 (the variance),1 f(x) = p2,exp,(x,2,)2 2,.,is the standard deviation.,If = 0,= 1, X has a standard normal distribution.,2), then Z = (X 2 2 i i,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beij

25、ing 2013,20 / 95,Machine Learning Basics,Probability,Some Continuous Distributions Multivariate Gaussian. Let x, 2 Rd, 2 S+ d a symmetric, positive denite matrix of size d d. Then X N(,) with PDF 1 1 1 f(x) = exp (x ) (x ) . 2 and 1 its inverse,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Be

26、ijing 2013,21 / 95,Machine Learning Basics,Probability,Marginal and Conditional of Gaussian If two (groups of) variables x,y are jointly Gaussian:,x y, N,x y,A C C B,(1),(Marginal) x N(x,A) (Conditional) y|x N(y + CA,1(x,x),B,CA,1C),Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,2

27、2 / 95,Machine Learning Basics,Probability,More Continuous Distributions 0 with 0. Generalizes factorial: (n) = (n 1)! when n is a positive integer. ( + 1) = () for 0. parameter 0 and scale parameter 0,f(x) =,1 (),x,1,e,x/, x 0.,Conjugate prior for Poisson rate.,Zhu (U Wisconsin),Understanding Socia

28、l Media,CCF/ADL Beijing 2013,23 / 95,Machine Learning Basics,Statistical Estimation,Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabil

29、istic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,24 / 95,Machine Learning Basics,Statistical Estimation,Parametric Models A statistical model H is a set of distributions. In machine learning, we call H the hypothesis space. A paramet

30、ric model can be parametrized by a nite number of parameters: f(x) f(x;) with parameter 2 Rd: H = f(x;) : 2 Rd where is the parameter space.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,25 / 95,Statistical Estimation,Machine Learning Basics Parametric Models We denote the expect

31、ation,E(g) =,Z,x,g(x)f(x;)dx,E means Exf(x;), not over dierent s. data 1 All (parametric) models are wrong. Some are more useful than others.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,26 / 95,Machine Learning Basics,Statistical Estimation,Nonparametric model A nonparametric m

32、odel cannot be parametrized by a xed number of parameters. Model complexity grows indenitely with sample size Example: H = P : V arP(X) 1. Given iid data x1,.,xn, the optimal estimator of the mean is again xi. Nonparametric makes weaker model assumptions and thus is preferred. But parametric models

33、converge faster and are more practical.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,27 / 95,Machine Learning Basics,Statistical Estimation,( ,Estimation X1 .Xn that attempts to estimate a parameter . This is the “learning” in machine learning! Example: In classication Xi = Pxi,

34、yi) and bn is the learned model. Consistent estimators learn the correct model with more training data eventually.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,28 / 95,bias( bn) = E( bn),q,The standard error of an estimator is se( bn) =,Var( bn),P,i xi, where xi N(0,1). Then the

35、 standard,Machine Learning Basics,Statistical Estimation,Bias E is w.r.t. the joint distribution f(x1,.,xn;) = i=1 f(xi;). The bias of the estimator is,deviation of xi is 1 regardless of n. In contrast, se() = 1/pn = n,1 2,An estimator is unbiased if bias( bn) = 0. Example: Let = n 1 which decreases

36、 with n.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,29 / 95,mse( bn) = E ( bn,Machine Learning Basics,Statistical Estimation,MSE The mean squared error of an estimator is,)2,Bias-variance decomposition,mse( bn) = bias2( bn) + se2( bn) = bias2( bn) + Var( bn) P,Zhu (U Wisconsin

37、),Understanding Social Media,CCF/ADL Beijing 2013,30 / 95,Y,Machine Learning Basics,Statistical Estimation,Maximum Likelihood Let x1,.,xn f(x;) where 2 . The likelihood function is,Ln() = f(x1,.,xn;) =,n i=1,f(xi;),The log likelihood function is n() = logLn(). The maximum likelihood estimator (MLE)

38、is bn = argmax2Ln() = argmax2n(),Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,31 / 95,Machine Learning Basics,Statistical Estimation,MLE examples The MLE for p(head) from n coin ips is count(head)/n for i Xi and 2 = 1/n (Xi 2. The MLE does not always agree with intuition. The ML

39、E for X1,.,Xn uniform(0,) is b= max(X1,.,Xn).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,32 / 95,Machine Learning Basics,Statistical Estimation,Properties of MLE When H is identiable, underPcertain conditions (see Wasserman parameter . That is, the MLE is consistent. Asymptoti

40、c Normality: Let se = 1/In() where In() is the Fisher information, and N(0,1) se The MLE is asymptotically e cient (achieves the Cramer-Rao lower bound), “best” among unbiased estimators.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,33 / 95,Machine Learning Basics,Statistical Es

41、timation,Frequentist statistics Probability refers to limiting relative frequency. Data are random. Estimators are random because they are functions of data. Parameters are xed, unknown constants not subject to probabilistic statements. Procedures are subject to probabilistic statements, for example

42、 95% condence intervals trap the true parameter value 95 Classiers, even learned with deterministic procedures, are random because the training set is random. PAC bound is frequentist. Most procedures in machine learning are frequentist methods.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL B

43、eijing 2013,34 / 95,Machine Learning Basics,Statistical Estimation,Bayesian statistics Probability refers to degree of belief. Inference about a parameter is by producing a probability distributions on it. Starts with prior distribution p(). Likelihood function p(x | ), a function of not x. After observing data x, one applies the Bayes rule to obtain the posterior 1 = p( Z evidence. Prediction by integrating parameters out: p(x | Data) = Z p(x | )p( | Data)d,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1