tjinjin's blog

インフラ要素多めの個人メモ

ALBのログをembulkを使ってmysqlに入れる

About

ALBのログを分析したかったので、手軽にできそうなembulkを使ってみました。

環境

pluginはこんな感じです。

$ embulk gem list
2017-07-21 09:46:57.920 +0900: Embulk v0.8.23

*** LOCAL GEMS ***

did_you_mean (default: 1.0.1)
embulk-input-s3 (0.2.11)
embulk-output-mysql (0.7.8)
jar-dependencies (default: 0.3.5)
jruby-openssl (0.9.17 java)
json (1.8.3 java)
minitest (default: 5.4.1)
net-telnet (default: 0.1.1)
power_assert (default: 0.2.3)
psych (2.0.17 java)
racc (1.4.14 java)
rake (default: 10.4.2)
rdoc (default: 4.2.0)
test-unit (default: 3.1.1)

設定ファイル

# s3_to_mysql.yml.liquid
in:
  type: s3
  bucket:  <s3_bucket> # modify
  path_prefix: AWSLogs/<account_id>/elasticloadbalancing/ap-northeast-1/2017/07/20/ #modify
  auth_method: session
  access_key_id: {{ env.AWS_ACCESS_KEY_ID }}
  secret_access_key: {{ env.AWS_SECRET_ACCESS_KEY }}
  session_token: {{ env.AWS_SESSION_TOKEN }}
  parser:
    charset: UTF-8
    newline: LF
    type: csv
    delimiter: ' '
    quote: ""
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: protocol, type: string}
    - {name: timestamp, type: string}
    - {name: elb, type: string}
    - {name: client_port, type: string}
    - {name: backend_port, type: string}
    - {name: request_processing_time, type: string}
    - {name: backend_processing_time, type: string}
    - {name: response_processing_time, type: string}
    - {name: elb_status_code, type: string}
    - {name: backend_status_code, type: string}
    - {name: received_bytes, type: string}
    - {name: send_bytes, type: string}
    - {name: request, type: string}
    - {name: user_agent, type: string}
    - {name: ssl_cipher, type: string}
    - {name: ssl_protocol, type: string}
    - {name: target_group_arn, type: string}
    - {name: trace_id, type: string}
  decoders:
    - {type: gzip}
out:
  type: mysql
  host: localhost
  user: root
  password: ""
  database: alb_log
  table: alb_log
  mode: replace

あとはmysqlにdatabase作ってembulk run s3_to_mysql.yml.liquid すれば完成です!全部文字列で突っ込んでいるので、time系は型を変えたほうがいいかもしれないです。