• 欢迎关注微信公众号:九万里大数据
  • 请使用Ctrl+D收藏本站到书签栏
  • 手机也可访问本站 jwldata.com

ClickHouse高可用集群安装配置和创建分布式表

大数据技术 九万里大数据 2年前 (2021-04-08) 505次浏览 0个评论 扫描二维码
文章目录[隐藏]

ClickHouse也可以做分布式部署,拥有shard和replica的概念,类似elasticsearch,但是在配置上面要复杂许多,之前介绍过单节点ClickHouse的安装ClickHouse导入测试数据集,这里在此基础上继续介绍ClickHouse集群化安装配置,创建一个分布式表,并往分布式表里导入测试数据。

3shards_1replicas

3分片1副本,也就是数据被打散到3个分片中,但总副本只有1份。这种一般是为了充分利用分布式计算的能力,但不考虑高可用性。

配置ClickHouse

在三个ClickHouse节点上,找到remote_servers配置项追加perftest_3shards_1replicas配置。注意,三个节点都需要配置。

vi /etc/clickhouse-server/config.xml
    <remote_servers>
        ......
        ......
        <perftest_3shards_1replicas>
            <shard>
                <replica>
                    <host>ctkf01</host>
                    <port>9020</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>ctkf02</host>
                    <port>9020</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>ctkf03</host>
                    <port>9020</port>
                </replica>
            </shard>
        </perftest_3shards_1replicas>
    </remote_servers>

重启ClickHouse

systemctl restart clickhouse-server

创建分布式表

任意选择一个ClickHouse节点比如ctkf01,创建本地表。通过在建表语句中指定ON CLUSTER perftest_3shards_1replicas语法,会自动在另外两个ClickHouse节点上创建本地表。

CREATE TABLE tutorial.hits_local ON CLUSTER perftest_3shards_1replicas
(
    `WatchID` UInt64,
    `JavaEnable` UInt8, 
    `Title` String,
    `GoodEvent` Int16,
    `EventTime` DateTime,
    `EventDate` Date,
    `CounterID` UInt32,
    `ClientIP` UInt32,
    `ClientIP6` FixedString(16),
    `RegionID` UInt32,
    `UserID` UInt64,
    `CounterClass` Int8,
    `OS` UInt8,
    `UserAgent` UInt8,
    `URL` String,
    `Referer` String,
    `URLDomain` String,
    `RefererDomain` String,
    `Refresh` UInt8,
    `IsRobot` UInt8,
    `RefererCategories` Array(UInt16),
    `URLCategories` Array(UInt16),
    `URLRegions` Array(UInt32),
    `RefererRegions` Array(UInt32),
    `ResolutionWidth` UInt16,
    `ResolutionHeight` UInt16,
    `ResolutionDepth` UInt8,
    `FlashMajor` UInt8,
    `FlashMinor` UInt8,
    `FlashMinor2` String,
    `NetMajor` UInt8,
    `NetMinor` UInt8,
    `UserAgentMajor` UInt16,
    `UserAgentMinor` FixedString(2),
    `CookieEnable` UInt8,
    `JavascriptEnable` UInt8,
    `IsMobile` UInt8,
    `MobilePhone` UInt8,
    `MobilePhoneModel` String,
    `Params` String,
    `IPNetworkID` UInt32,
    `TraficSourceID` Int8,
    `SearchEngineID` UInt16,
    `SearchPhrase` String,
    `AdvEngineID` UInt8,
    `IsArtifical` UInt8,
    `WindowClientWidth` UInt16,
    `WindowClientHeight` UInt16,
    `ClientTimeZone` Int16,
    `ClientEventTime` DateTime,
    `SilverlightVersion1` UInt8,
    `SilverlightVersion2` UInt8,
    `SilverlightVersion3` UInt32,
    `SilverlightVersion4` UInt16,
    `PageCharset` String,
    `CodeVersion` UInt32,
    `IsLink` UInt8,
    `IsDownload` UInt8,
    `IsNotBounce` UInt8,
    `FUniqID` UInt64,
    `HID` UInt32,
    `IsOldCounter` UInt8,
    `IsEvent` UInt8,
    `IsParameter` UInt8,
    `DontCountHits` UInt8,
    `WithHash` UInt8,
    `HitColor` FixedString(1),
    `UTCEventTime` DateTime,
    `Age` UInt8,
    `Sex` UInt8,
    `Income` UInt8,
    `Interests` UInt16,
    `Robotness` UInt8,
    `GeneralInterests` Array(UInt16),
    `RemoteIP` UInt32,
    `RemoteIP6` FixedString(16),
    `WindowName` Int32,
    `OpenerName` Int32,
    `HistoryLength` Int16,
    `BrowserLanguage` FixedString(2),
    `BrowserCountry` FixedString(2),
    `SocialNetwork` String,
    `SocialAction` String,
    `HTTPError` UInt16,
    `SendTiming` Int32,
    `DNSTiming` Int32,
    `ConnectTiming` Int32,
    `ResponseStartTiming` Int32,
    `ResponseEndTiming` Int32,
    `FetchTiming` Int32,
    `RedirectTiming` Int32,
    `DOMInteractiveTiming` Int32,
    `DOMContentLoadedTiming` Int32,
    `DOMCompleteTiming` Int32,
    `LoadEventStartTiming` Int32,
    `LoadEventEndTiming` Int32,
    `NSToDOMContentLoadedTiming` Int32,
    `FirstPaintTiming` Int32,
    `RedirectCount` Int8,
    `SocialSourceNetworkID` UInt8,
    `SocialSourcePage` String,
    `ParamPrice` Int64,
    `ParamOrderID` String,
    `ParamCurrency` FixedString(3),
    `ParamCurrencyID` UInt16,
    `GoalsReached` Array(UInt32),
    `OpenstatServiceName` String,
    `OpenstatCampaignID` String,
    `OpenstatAdID` String,
    `OpenstatSourceID` String,
    `UTMSource` String,
    `UTMMedium` String,
    `UTMCampaign` String,
    `UTMContent` String,
    `UTMTerm` String,
    `FromTag` String,
    `HasGCLID` UInt8,
    `RefererHash` UInt64,
    `URLHash` UInt64,
    `CLID` UInt32,
    `YCLID` UInt64,
    `ShareService` String,
    `ShareURL` String,
    `ShareTitle` String,
    `ParsedParams` Nested(
        Key1 String,
        Key2 String,
        Key3 String,
        Key4 String,
        Key5 String,
        ValueDouble Float64),
    `IslandID` FixedString(16),
    `RequestNum` UInt32,
    `RequestTry` UInt8
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)

ClickHouse高可用集群安装配置和创建分布式表

任意选择一个ClickHouse节点比如ctkf01,创建分布式表。通过在建表语句中指定ON CLUSTER perftest_3shards_1replicas语法,会自动在另外两个ClickHouse节点上创建分布式表。分布式表在ClickHouse中表现为视图形式。

CREATE TABLE tutorial.hits_all ON CLUSTER perftest_3shards_1replicas AS tutorial.hits_local
ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand())

ClickHouse高可用集群安装配置和创建分布式表

往分布式表中导入测试数据。

INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1

ClickHouse高可用集群安装配置和创建分布式表

等导入完成后,比较三个节点上hits_local记录数和hits_all记录数,可以看到三个节点的hits_local记录数之和等于hits_all记录数,也等于hits_v1记录数。

host tutorial.hits_local记录数 tutorial.hits_all记录数 tutorial.hits_v1记录数
ctkf01 2957526 8873898 8873898
ctkf02 2957749 8873898
ctkf03 2958623 8873898

2957526+2957749+2958623=8873898

2shards_2replicas

2分片2副本,也就是数据被打散到2个分片中,总副本有2份。这种充分利用了分布式计算的能力,也考虑了高可用性。一般生产环境都是这种多分片多副本的形式。多副本需要借助于ZooKeeper实现高可用,ClickHouse官方推荐生产环境单独搭建ZooKeeper集群,在ZK集群上也不要跑其他程序。

配置ClickHouse

vi /etc/clickhouse-server/config.xml
    <remote_servers>
        ......
        ......
        <perftest_2shards_2replicas>
            <shard>
                <!-- Optional. Shard weight when writing data. Default: 1. -->
                <weight>1</weight>
                <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
                <internal_replication>true</internal_replication>
                <replica>
                    <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                    <priority>1</priority>
                    <host>ctkf01</host>
                    <port>9020</port>
                </replica>
                <replica>
                    <host>ctkf02</host>
                    <port>9020</port>
                </replica>
            </shard>
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>ctkf03</host>
                    <port>9020</port>
                </replica>
                <replica>
                    <host>ctkf04</host>
                    <port>9020</port>
                </replica>
            </shard>
        </perftest_2shards_2replicas>
    </remote_servers>
    <zookeeper>
        <node>
            <host>ctkf01</host>
            <port>2181</port>
        </node>
        <node>
            <host>ctkf02</host>
            <port>2181</port>
        </node>
        <node>
            <host>ctkf03</host>
            <port>2181</port>
        </node>
    </zookeeper>
    <macros>
        <layer>01</layer>
        <shard>01</shard>
        <replica>ctkf01</replica>
    </macros>

ClickHouse支持使用macros宏定义变量,可以使得建表语句保持统一。注意,这个macros配置每个节点不同,需要单独配置。 以下是另外三个节点的macros配置。其他如remote_servers和zookeeper配置都是一样的。
ctkf02节点

    <macros>
        <layer>01</layer>
        <shard>01</shard>
        <replica>ctkf02</replica>
    </macros>

ctkf03节点

    <macros>
        <layer>01</layer>
        <shard>02</shard>
        <replica>ctkf03</replica>
    </macros>

ctkf04节点

    <macros>
        <layer>01</layer>
        <shard>02</shard>
        <replica>ctkf04</replica>
    </macros>

重启ClickHouse

systemctl restart clickhouse-server

可以通过系统表查询remote_servers的配置是否生效。

SELECT * FROM system.clusters

ClickHouse高可用集群安装配置和创建分布式表

创建分布式表

ClickHouse高可用集群安装配置和创建分布式表


本站文章,如未注明,均为原创 | 原创文章版权归九万里大数据所有,未经许可不得转载。
本文链接:ClickHouse高可用集群安装配置和创建分布式表
喜欢 (1)

您必须 登录 才能发表评论!