99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話:13121318867

登錄
首頁精彩閱讀R語言作為BI中ETL的工具
R語言作為BI中ETL的工具
2017-05-14
收藏

R語言作為BI中ETL的工具

R語言作為BI中ETL的工具,增刪改

R語言提供了強(qiáng)大的R_package與各種數(shù)據(jù)庫進(jìn)行數(shù)據(jù)交互。
外加其強(qiáng)大數(shù)據(jù)變換清洗函數(shù),為ETL提供一條方便快捷的道路。
RODBC
ROracal
RMysql
Rmongodb

    step1 新建連接con,并查看其信息

library(RODBC)
con<-odbcConnect("LI")
con  

RODBC Connection 1
Details:
  case=nochange
  DSN=LI
  UID=
  Trusted_Connection=Yes
  APP=RStudio
  WSID=LIYI-PC

    step2 引入數(shù)據(jù)集USArrests

data(USArrests)
head(USArrests)
           Murder Assault UrbanPop Rape
Alabama      13.2     236       58 21.2
Alaska       10.0     263       48 44.5
Arizona       8.1     294       80 31.0
Arkansas      8.8     190       50 19.5
California    9.0     276       91 40.6
Colorado      7.9     204       78 38.7

    step3 將USArrests保存到sqlserver

sqlSave(con, USArrests)

sqlSave(channel, dat, tablename = NULL, append = FALSE,
        rownames = TRUE, colnames = FALSE, verbose = FALSE,
        safer = TRUE, addPK = FALSE, typeInfo, varTypes,
        fast = TRUE, test = FALSE, nastring = NULL)
channel      數(shù)據(jù)庫連接通道
dat              data frame.要存入的數(shù)據(jù)集
tablename  character 數(shù)據(jù)庫中表名
index          character 索引列的名字
append       logical邏輯變量 ,是否數(shù)據(jù)集添加/寫入已存在的表
rownames  logical 邏輯變量 or character字符串,如果是logical,表示是否把rowname這個(gè)字符串作為數(shù)據(jù)庫表首列列名,如果是字符串,則將新字符串作為表首列列名
colnames  logical 邏輯變量 是否將數(shù)據(jù)集的列名保留作為表的首行 (謹(jǐn)慎更改,可能導(dǎo)致,列名變成數(shù)據(jù)第一行,各列的數(shù)據(jù)類型全部變?yōu)関archar(255)
verbose   display statements as they are sent to the server?
safer     logical邏輯變量.如果真,生成一個(gè)新表,不在已存在的表后添加。如果假,強(qiáng)制刪除已存在的同名表并新建,或者刪除表中已存在的數(shù)據(jù),覆蓋寫入
addPK     logical邏輯變量,是否將首列作為主鍵
typeInfo  list 數(shù)據(jù)框中數(shù)據(jù)類型。包括character ,double ,integer
varTypes  an optional named character vector giving the DBMSs datatypes to be used for some (or all) of the columns if a table is to be created.可選項(xiàng)涉及各列數(shù)據(jù)類型轉(zhuǎn)換,因?yàn)閿?shù)據(jù)庫中的數(shù)據(jù)類型比R語言中的要多很多。
fast      logical. If false, write data a row at a time. If true, use a parametrized INSERT INTO or UPDATE query to write all the data in one operation. 邏輯變量,如果F,數(shù)據(jù)將一次一行地寫入,如果為Ture,將用到變量插入INSERT INTO,或者UPDATE 將數(shù)據(jù)一次性寫入
nastring     optional character string to be used for writing NAs to the database. 選擇哪種字符串,將缺失項(xiàng)在數(shù)據(jù)庫中填充

getSqlTypeInfo("Microsoft SQL Server")
$double
[1] "float"

$integer
[1] "int"

$character
[1] "varchar(255)"

$logical
[1] "varchar(5)"

sqlSave(con, USArrests,rownames = "city", addPK = T) # 原沒有列名的rownames改名為city,并設(shè)置首列為主鍵key

sqlSave(con, USArrests,'USA2',rownames = "city", addPK = T,fast=T,test=T)  #注意此操作可能在數(shù)據(jù)庫中create名為USA2的空表

Binding: 'city' DataType 12, ColSize 255
Binding: 'Murder' DataType 6, ColSize 53
Binding: 'Assault' DataType 4, ColSize 10
Binding: 'UrbanPop' DataType 4, ColSize 10
Binding: 'Rape' DataType 6, ColSize 53
Parameters:
no: 1: city Alabama/***/no: 2: Murder 13.2/***/no: 3: Assault 236/***/no: 4: UrbanPop 58/***/no: 5: Rape 21.2/***/
no: 1: city Alaska/***/no: 2: Murder 10/***/no: 3: Assault 263/***/no: 4: UrbanPop 48/***/no: 5: Rape 44.5/***/
no: 1: city Arizona/***/no: 2: Murder 8.1/***/no: 3: Assault 294/***/no: 4: UrbanPop 80/***/no: 5: Rape 31/***/
no: 1: city Arkansas/***/no: 2: Murder 8.8/***/no: 3: Assault 190/***/no: 4: UrbanPop 50/***/no: 5: Rape 19.5/***/
no: 1: city California/***/no: 2: Murder 9/***/no: 3: Assault 276/***/no: 4: UrbanPop 91/***/no: 5: Rape 40.6/***/
no: 1: city Colorado/***/no: 2: Murder 7.9/***/no: 3: Assault 204/***/no: 4: UrbanPop 78/***/no: 5: Rape 38.7/***/
no: 1: city Connecticut/***/no: 2: Murder 3.3/***/no: 3: Assault 110/***/no: 4: UrbanPop 77/***/no: 5: Rape 11.1/***/
no: 1: city Delaware/***/no: 2: Murder 5.9/***/no: 3: Assault 238/***/no: 4: UrbanPop 72/***/no: 5: Rape 15.8/***/
no: 1: city Florida/***/no: 2: Murder 15.4/***/no: 3: Assault 335/***/no: 4: UrbanPop 80/***/no: 5: Rape 31.9/***/
no: 1: city Georgia/***/no: 2: Murder 17.4/***/no: 3: Assault 211/***/no: 4: UrbanPop 60/***/no: 5: Rape 25.8/***/
no: 1: city Hawaii/***/no: 2: Murder 5.3/***/no: 3: Assault 46/***/no: 4: UrbanPop 83/***/no: 5: Rape 20.2/***/
no: 1: city Idaho/***/no: 2: Murder 2.6/***/no: 3: Assault 120/***/no: 4: UrbanPop 54/***/no: 5: Rape 14.2/***/
no: 1: city Illinois/***/no: 2: Murder 10.4/***/no: 3: Assault 249/***/no: 4: UrbanPop 83/***/no: 5: Rape 24/***/
no: 1: city Indiana/***/no: 2: Murder 7.2/***/no: 3: Assault 113/***/no: 4: UrbanPop 65/***/no: 5: Rape 21/***/
no: 1: city Iowa/***/no: 2: Murder 2.2/***/no: 3: Assault 56/***/no: 4: UrbanPop 57/***/no: 5: Rape 11.3/***/
no: 1: city Kansas/***/no: 2: Murder 6/***/no: 3: Assault 115/***/no: 4: UrbanPop 66/***/no: 5: Rape 18/***/
# 此處省略10000字

sqlColumns    Enquire about the column structure of tables on an ODBC database connection. 訪問數(shù)據(jù)庫表的結(jié)構(gòu)

columnsenquire<-sqlColumns(con,'USA2')

str(columnsenquire)

str(columnsenquire)
'data.frame':   5 obs. of  29 variables:
 $ TABLE_CAT                           : chr  "master" "master" "master" "master" ...
 $ TABLE_SCHEM                         : chr  "dbo" "dbo" "dbo" "dbo" ...
 $ TABLE_NAME                          : chr  "USA2" "USA2" "USA2" "USA2" ...
 $ COLUMN_NAME                         : chr  "city" "Murder" "Assault" "UrbanPop" ...
 $ DATA_TYPE                           : int  12 6 4 4 6
 $ TYPE_NAME                           : chr  "varchar" "float" "int" "int" ...
 $ COLUMN_SIZE                         : int  255 53 10 10 53
 $ BUFFER_LENGTH                       : int  255 8 4 4 8
 $ DECIMAL_DIGITS                      : int  NA NA 0 0 NA
 $ NUM_PREC_RADIX                      : int  NA 2 10 10 2
 $ NULLABLE                            : int  0 1 1 1 1
 $ REMARKS                             : chr  NA NA NA NA ...
 $ COLUMN_DEF                          : chr  NA NA NA NA ...
 $ SQL_DATA_TYPE                       : int  12 6 4 4 6
 $ SQL_DATETIME_SUB                    : int  NA NA NA NA NA
 $ CHAR_OCTET_LENGTH                   : int  255 NA NA NA NA
 $ ORDINAL_POSITION                    : int  1 2 3 4 5
 $ IS_NULLABLE                         : chr  "NO" "YES" "YES" "YES" ...
 $ SS_IS_SPARSE                        : int  0 0 0 0 0
 $ SS_IS_COLUMN_SET                    : int  0 0 0 0 0
 $ SS_IS_COMPUTED                      : int  0 0 0 0 0
 $ SS_IS_IDENTITY                      : int  0 0 0 0 0
 $ SS_UDT_CATALOG_NAME                 : chr  NA NA NA NA ...
 $ SS_UDT_SCHEMA_NAME                  : chr  NA NA NA NA ...
 $ SS_UDT_ASSEMBLY_TYPE_NAME           : chr  NA NA NA NA ...
 $ SS_XML_SCHEMACOLLECTION_CATALOG_NAME: chr  NA NA NA NA ...
 $ SS_XML_SCHEMACOLLECTION_SCHEMA_NAME : chr  NA NA NA NA ...
 $ SS_XML_SCHEMACOLLECTION_NAME        : chr  NA NA NA NA ...
 $ SS_DATA_TYPE                        : chr  "39" "109" "38" "38" ...

    step4 查詢數(shù)據(jù)

    sqlQuery(con,'select * from USArrests')

    # 注意此時(shí)第一列的名字已經(jīng)為city了
             city Murder Assault UrbanPop Rape
    1         Alabama   13.2     236       58 21.2
    2          Alaska   10.0     263       48 44.5
    3         Arizona    8.1     294       80 31.0
    4        Arkansas    8.8     190       50 19.5
    5      California    9.0     276       91 40.6
    6        Colorado    7.9     204       78 38.7
    7     Connecticut    3.3     110       77 11.1
    8        Delaware    5.9     238       72 15.8
    9         Florida   15.4     335       80 31.9
    10        Georgia   17.4     211       60 25.8
    11         Hawaii    5.3      46       83 20.2
    12          Idaho    2.6     120       54 14.2
    13       Illinois   10.4     249       83 24.0
    14        Indiana    7.2     113       65 21.0
    ...

    對于sql語句可能是以‘XX’ 結(jié)尾則需要用如下形式來進(jìn)行查詢

    sqlQuery(con,paste('select * from USArrests',
         "where city = 'Alabama'"))

     city Murder Assault UrbanPop Rape
    1 Alabama   13.2     236       58 21.2

    但是對于Update,以下卻是失效的

    a<-paste("update [master].[dbo].[USArrests]",
         "set Murder =13.2","where city ='Alabama'")
    sqlQuery(con,a)   # 失效
    sqlUpdate(con,a)  # 失效

    sqlUpdate()  sqlUpdate(channel, dat, tablename = NULL, index = NULL,
          verbose = FALSE, test = FALSE, nastring = NULL,
          fast = TRUE)
    不能進(jìn)行腳本語句直接更新,但是可以進(jìn)行如下操作
    foo <- cbind(city=row.names(USArrests), USArrests)[1:3, c(1,3)]
     foo
           city Assault
    Alabama Alabama     236
    Alaska   Alaska     263
    Arizona Arizona     294
     foo[1,2] <- 200
    foo
           city Assault
    Alabama Alabama     200
    Alaska   Alaska     263
    Arizona Arizona     294
    流程是先選定要更新的行列,將值更新,然后再將值update入庫

    實(shí)例如下

temp<-sqlQuery(con,paste('select * from USArrests',
"where city = 'Alabama'"))
temp
     city Murder Assault UrbanPop Rape
1 Alabama   13.2     300       58 21.2
  str(temp)
'data.frame':   1 obs. of  5 variables:
 $ city    : Factor w/ 1 level "Alabama": 1
 $ Murder  : num 13.2
 $ Assault : num 300
 $ UrbanPop: int 58
 $ Rape    : num 21.2
  temp[1,]
[1] 300

  temp[1,3]<-200
  sqlUpdate(con, temp, "USArrests")
  sqlQuery(con,paste('select * from USArrests',
"where city = 'Alabama'"))
     city Murder Assault UrbanPop Rape
1 Alabama   13.2     200       58 21.2

sqlFetch(con, "USArrests", rownames = "city", max = 20,rows_at_time = 10)

實(shí)踐后發(fā)現(xiàn),單單對于簡單的ETL,sqlQuery,sqlUpdate是足夠了,
寫一些for循環(huán)+list.files/liset.dir+reshape/ddplyr/tidyr(進(jìn)行數(shù)據(jù)篩選,清洗,變換),
對于腳本是否執(zhí)行的問題,可以寫日志文件,對ETL過程進(jìn)行檢測。
    step5 構(gòu)建日志文件
    監(jiān)督機(jī)制,對于每次要寫入/更新的數(shù)據(jù)進(jìn)行count(originaldata),保存數(shù)據(jù),然后與每次執(zhí)行寫入/變更的行數(shù)Rows進(jìn)行比對,比對結(jié)果result為邏輯變量。將三者連同 執(zhí)行時(shí)間,寫成一個(gè)數(shù)據(jù)框,并在數(shù)據(jù)中新建對應(yīng)的日志表,對執(zhí)行結(jié)果進(jìn)行監(jiān)督。(對于追蹤Rows沒有找到很好的解決方法,send a letter ...)
    暴力美學(xué),對于sqlSave / sqlQuery 的執(zhí)行結(jié)果進(jìn)行追蹤,成功為“1”,失敗則"ON Errors"
    step6 快速操作數(shù)據(jù)庫
sqlClear deletes all the rows of the table sqtable. #清楚表中數(shù)據(jù)
sqlDrop removes the table sqtable (if permitted). #刪除表
sqlClear(channel, sqtable, errors = TRUE)
sqlDrop(channel, sqtable, errors = TRUE)

數(shù)據(jù)分析咨詢請掃描二維碼

若不方便掃碼,搜微信號:CDAshujufenxi

數(shù)據(jù)分析師資訊
更多

OK
客服在線
立即咨詢
客服在線
立即咨詢
') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機(jī)數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進(jìn)行初始化 // 參數(shù)1:配置參數(shù) // 參數(shù)2:回調(diào),回調(diào)的第一個(gè)參數(shù)驗(yàn)證碼對象,之后可以使用它調(diào)用相應(yīng)的接口 initGeetest({ // 以下 4 個(gè)配置參數(shù)為必須,不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺檢測極驗(yàn)服務(wù)器是否宕機(jī) new_captcha: data.new_captcha, // 用于宕機(jī)時(shí)表示是新驗(yàn)證碼的宕機(jī) product: "float", // 產(chǎn)品形式,包括:float,popup width: "280px", https: true // 更多配置參數(shù)說明請參見:http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計(jì)時(shí)完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請輸入'+oInput.attr('placeholder')+'!'; var errTxt = '請輸入正確的'+oInput.attr('placeholder')+'!'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }