使用 Claude Code 升级 Hexo 和 NexT 主题

Posted on 2026-03-27 In 工具

概述

今天做了一件很酷的事情 —— 使用 Claude Code 辅助升级了我的博客。

先说说背景：我的博客已经运行了很多年，但 Hexo 框架和 NexT 主题的版本一直停留在好几年前：

Hexo: 6.2.0 → 8.1.1
NexT 主题: 7.8.0 → 8.27.0

这可不是个小工程，涉及到依赖升级、配置迁移、兼容性测试等等。但有了 Claude Code 的帮忙，整个过程出奇地顺利。

什么是 Claude Code？

Claude Code 是 Anthropic 推出的命令行 AI 助手，专门用于帮助开发者完成编程任务。它可以：

理解并执行复杂的开发任务
自动执行 shell 命令
读写和修改代码文件
执行 git 操作
进行代码审查

简单说，就是一个懂编程的 AI 助手，可以直接在你的项目里工作。

升级过程

1. 项目分析

首先，我让 Claude Code 分析了当前项目：

# Claude Code 自动执行的分析命令
ls -la
cat package.json
cat themes/next/package.json
git log --oneline -5

通过分析，AI 快速了解了项目结构和技术栈。

2. 版本调研

接下来，让 AI 查询最新版本的依赖：

1 2	npm show hexo version npm show hexo-theme-next version

查询结果：

Hexo 最新版：8.1.1
NexT 主题最新版：8.27.0

3. 执行升级

这是最复杂的部分，涉及到多个步骤：

3.1 更新 package.json

AI 帮我修改了 package.json 中的所有依赖版本：

{
  "dependencies": {
    "hexo": "^8.1.1",
    "hexo-deployer-git": "^4.0.0",
    "hexo-generator-archive": "^2.0.0",
    "hexo-generator-category": "^2.0.0",
    "hexo-generator-index": "^3.0.0",
    "hexo-generator-searchdb": "^1.4.1",
    "hexo-generator-tag": "^2.0.0",
    "hexo-renderer-marked": "^6.0.0",
    "hexo-renderer-stylus": "^3.0.0",
    "hexo-theme-next": "^8.27.0"
  }
}

3.2 安装依赖

1	npm install

3.3 配置迁移

这是最繁琐的部分。NexT 8.x 的配置方式有所变化，需要将我原来的配置迁移到新的格式。

AI 帮我：

读取了旧的 themes/next/_config.yml 配置
读取了新版本的默认配置
对比差异并生成兼容配置
将配置写入 _config.yml 的 theme_config 部分

保留的配置包括：

社交链接（GitHub、Weibo、Twitter）
打赏功能
百度统计和推送
字体自定义
动画效果
本地搜索

3.4 测试构建

1	npm run build

构建成功，生成了 94 个文件！

3.5 本地预览

1	npm run server

访问 http://localhost:4000 查看效果，一切正常！

4. 提交和推送

最后一步，让 AI 帮我完成 git 操作：

1
2
3

git add package.json package-lock.json yarn.lock _config.yml
git commit -m "升级：Hexo 8.1.1 和 NexT 主题 8.27.0"
git push origin master

使用体验

优点

效率高：整个升级过程只用了不到半小时，如果手动操作可能需要几个小时甚至更久。
自动化：不需要我一条条执行命令，AI 会自动分析、决策、执行。
减少错误：配置文件的手动修改很容易出错，AI 可以仔细对比和迁移配置。
上下文理解：AI 能理解项目的整体结构，做出的决策更加合理。
交互式协作：我可以随时提问、调整方向，AI 会即时响应。

不足

需要人工确认：关键操作（如 git push）还是需要人工确认，不能完全放手。
复杂决策仍需人工：对于一些架构层面的决策（比如要不要迁移到 Next.js），AI 只能给出建议，最终决定权在人。

升级结果

项目	原版本	新版本
Hexo	6.2.0	8.1.1
NexT	7.8.0	8.27.0
依赖包	多个旧版本	全部最新

升级后变化：

FontAwesome 图标升级到 v6（fa → fab/fa/fa-solid）
新增 Light-Dark 模式支持（可选）
改进的移动端适配
更好的性能

总结

这次升级体验让我印象深刻。Claude Code 这样的 AI 助手正在改变我们写代码的方式 —— 它不是替代我们，而是成为得力的助手。

对于一些重复性、繁琐的任务（比如依赖升级、配置迁移），AI 可以高效完成；而我们则可以把精力集中在更有价值的事情上，比如内容创作、架构设计等。

推荐尝试：如果你也有老旧项目需要升级，不妨试试用 AI 助手来帮忙。

概述

在Kubernetes中使用Cert-Manager签发免费证书，可以让我们更方便地管理和使用证书。Cert-Manager是一个开源项目，用于管理和自动化证书。如果对安全级别和证书功能要求不高，可以利用Cert-Manager基于ACME协议与Let’s Encrypt进行证书签发，并自动续订证书。

前提条件

Kubernetes集群
安装kubectl/helm

使用Helm安装Cert-Manager

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.13.2 \
  # --set installCRDs=true

ACME

ACME有两种协议，分别是HTTP01和DNS01。

HTTP01

HTTP01是一种基于HTTP的证书签发协议，可以让用户免费使用Let’s Encrypt提供的免费证书。

DNS01

ACME是一种基于DNS的证书签发协议，可以让用户免费使用Let’s Encrypt提供的免费证书。

如果使用DNS01，需要在DNS服务器上配置一条TXT记录，并将其值设置为cert-manager-webhook-dns-solver。如果DNS供应商提供了API，可以通过API实现自动化配置。

由于我的DNS不提供API，所以我使用了HTTP01。

ACME Issuer

Issuer是Cert-Manager中最重要的资源，用于管理证书签发流程。有两种Issuer：

ClusterIssuer：用于管理集群范围内的证书签发流程。
Issuer：用于管理命名空间范围内的证书签发流程。

创建ClusterIssuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: user@example.com
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Secret resource that will be used to store the account's private key.
      name: example-issuer-account-key
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          ingressClassName: nginx

注意：

email：用于接收Let’s Encrypt的通知邮件，如果不指定，会收到一封默认的邮件。

server：Let’s Encrypt的ACME服务器地址，这里的地址是Let’s Encrypt的测试服务器地址，https://acme-staging-v02.api.letsencrypt.org/directory为Staging环境，https://acme-v02.api.letsencrypt.org/directory为Production环境。Staging环境产生的证书是不会被浏览器信任的，生产环境需要使用正式的ACME服务器地址。

privateKeySecretRef：用于存储私钥的Secret资源名称。

solvers：用于指定ACME协议的验证方式，可以选择HTTP01或者DNS01。如果使用HTTP01，需要在Ingress中指定ingressClassName，否则会报错。

ingressClassName：Ingress的ingressClassName，用于指定Ingress的类型。

为kubernetes的Ingress创建证书

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    # add an annotation indicating the issuer to use.
    cert-manager.io/cluster-issuer: nameOfClusterIssuer
  name: myIngress
  namespace: myIngress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: myservice
            port:
              number: 80
  tls: # < placing a host in the TLS config will determine what ends up in the cert's subjectAltNames
  - hosts:
    - example.com
    secretName: myingress-cert # < cert-manager will store the created certificate in this secret.

注意：

cert-manager.io/cluster-issuer：用于指定证书签发流程使用的Issuer名称。

hosts：用于指定证书的域名。

secretName：用于指定证书存储的Secret名称。

正常情况下，这时候，证书就会签发成功了，可以通过https://example.com访问到服务。只是我遇到了问题。

问题

在创建Ingress后，发现证书颁发失败，检查状态和日志：

kubectl get Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges -n partner 
NAME                                                READY   AGE
clusterissuer.cert-manager.io/letsencrypt-staging   True    42h

NAME                                          READY   SECRET            AGE
certificate.cert-manager.io/myingress-cert    False   myingress-cert    42h

NAME                                                   APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
certificaterequest.cert-manager.io/myingress-cert-1    True                False   letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   11h

NAME                                                      STATE     AGE
order.acme.cert-manager.io/myingress-cert-1-1904933461   invalid   11h

在看看Challenges细节：

kubectl describe challenges.acme.cert-manager.io/myingress-cert-1-1904933461 -n partner
Name:         myingress-cert-1-1904933461-2937188937
Namespace:    partner
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1
Kind:         Challenge
Metadata:
  Creation Timestamp:  2023-12-05T13:39:58Z
  Finalizers:
    finalizer.acme.cert-manager.io
  Generation:  1
  Owner References:
    API Version:           acme.cert-manager.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  myingress-cert-1-1904933461
    UID:                   1691de2d-1c32-4f10-bee7-81b1d4c9f7bc
  Resource Version:        4440169
  UID:                     4092e71c-4028-47f9-8c0f-f0d5e95a4e81
Spec:
  Authorization URL:  https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/9874881604
  Dns Name:           example.com
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   ClusterIssuer
    Name:   letsencrypt-staging
  Key:      xxxxxxxxx.xxxxxxxxx
  Solver:
    http01:
      Ingress:
        Ingress Class Name:  nginx
  Token:                     xxxxxxxxx
  Type:                      HTTP-01
  URL:                       https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/9874881604/fKmvZw
  Wildcard:                  false
Status:
  Presented:   false
  Processing:  false
  Reason:      Error accepting authorization: acme: authorization error for example.com: 400 urn:ietf:params:acme:error:connection: xxx.xx.xx.xxx: Fetching http://example.com/.well-known/acme-challenge/xxxxxxxxx: Timeout during connect (likely firewall problem)
  State:       invalid
Events:        <none>

其中xxx.xx.xx.xxx是我的公网IP

检查Cert-Manager的日志，发现实际访问http://example.com/.well-known/acme-challenge/xxxxxxxxx时，开始有访问404，并不是400。怀疑是网络原因。

我的网络是在内网有一个DNS服务器，将域名直接解析为内网IP，而通过公网访问时使用公共DNS访问公网IP。在出口路由器上做了端口映射，将公网端口映射到内网端口，这样就可以通过公网访问内网服务了。并且有一个限制，就是从内网不允许通过公网访问映射端口在回到内网。

怀疑是Cert-Manager在校验token时，通过公网访问了example.com，导致超时。先解除从内网走公网再进入内网的限制，测试成功。

分析

Cert-Manager在做域名校验时，会从容器环境访问域名，而这个域名解析并没有使用内网DNS，使用的公网DNS。实际更好的方法应该是使用内网DNS，这样就不会有访问公网的限制，但目前我还没有找到这个配置方式，留到后面再研究。

NFS访问失败问题解决

Posted on 2022-10-05 In k8s

今天升级Kubernetes到1.24，发现PVC创建时一直处于pending状态，经过检查发现，和之前遇到的问题一样，参考：升级Kubernetes V1.20后，pvc无法创建问题解决

然后检查/etc/kubernetes/manifests/kube-apiserver.yaml文件，之前设置的RemoveSelfLink=false，在升级后的确没有了，按照之前的方法增加，等待API Server重启。

然而，kube-apiserver启动不了了。

检查kube-apiserver，发现RemoveSelfLink=false在新的版本中已经不允许使用了。经过检查和验证，解决办法是更换nfs-client-provisioner。

需要将nfs-client-provisioner更换为nfs-subdir-external-provisioner，我使用的是helm部署，更换方法如下：

先增加helm库

1 2	helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner helm repo update

删除原来的部署
1
helm uninstall nfs-prod

修改value文件，新的文件如下

image:
  repository: registry.cn-shanghai.aliyuncs.com/c7n/nfs-subdir-external-provisioner
storageClass:
  name: <name>
  archiveOnDelete: false
  defaultClass: true
nfs:
  server: <nfs-ip>
  path: <nfs-path>
nodeSelector: {}

重新部署

1	helm install <name> nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -f <value file>

测试正常。

Azure Bicep开发环境搭建

Posted on 2022-02-08

最近在MacOS搭建Bicep环境，发现VS Code始终无法成功识别Bicep，Bicep插件在启动时依赖.net环境，虽然手动安装了.net环境，但依然无法启动。

参考：

https://github.com/dotnet/vscode-dotnet-runtime/blob/main/Documentation/troubleshooting-runtime.md#install-script-timeouts

https://www.azuredeveloper.cn/article/azure-tutorial-auzre-bicep-introduction

修改设置，增加以下内容：

"dotnetAcquisitionExtension.existingDotnetPath": [
    {"extensionId": "ms-azuretools.vscode-bicep", "path": "/usr/local/share/dotnet/dotnet"},
    {"extensionId": "msazurermtools.azurerm-vscode-tools", "path": "/usr/local/share/dotnet/dotnet"}
]

这样就可以正常使用了。

Kubernetes证书过期(二)

Posted on 2022-01-15 In k8s

今天在部署Kubernetes的Deployment时，删除Deployment，发现不会自动删除Replica Set和Pod，开始怀疑是某个工作node故障，重启或者迁移到其他node问题一样。后来重启了master节点，发现有2个master启动无法注册node。

我的环境中有3个master节点，检查容器，发现master01启动正常，master02和master03的etcd容器都无法启动，查看日志发现提示如下：

1 2	Jan 14 22:29:29 master02.k8s kubelet[1998]: E0114 22:29:29.187573 1998 kubelet.go:2291] "Error getting node" err="node \"master02.k8s\" not found"

在master01上查看etcd的日志发现：

2022-01-14 15:57:13.963722 I | embed: rejected connection from "192.168.203.4:45008" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "")

明确是证书过期导致的，根据上次过期的经验检查如下：

# kubeadm certs renew
missing subcommand; "renew" is not meant to be run on its own
To see the stack trace of this error execute with --v=5 or higher
[root@master02 kubernetes]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 10, 2022 00:10 UTC   238d                                    no      
apiserver                  Sep 10, 2022 00:20 UTC   238d            ca                      no      
apiserver-etcd-client      Sep 10, 2022 00:20 UTC   238d            etcd-ca                 no      
apiserver-kubelet-client   Sep 10, 2022 00:20 UTC   238d            ca                      no      
controller-manager.conf    Sep 10, 2022 00:09 UTC   238d                                    no      
etcd-healthcheck-client    Dec 22, 2021 23:53 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  Dec 22, 2021 23:53 UTC   <invalid>       etcd-ca                 no      
etcd-server                Dec 22, 2021 23:53 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         Sep 10, 2022 00:20 UTC   238d            front-proxy-ca          no      
scheduler.conf             Sep 10, 2022 00:09 UTC   238d                                    no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 24, 2029 07:18 UTC   7y              no      
etcd-ca                 Dec 24, 2029 07:18 UTC   7y              no      
front-proxy-ca          Dec 24, 2029 07:18 UTC   7y              no

可以看出etcd的证书的确过期，renew证书：

1
2
3

# kubeadm certs renew healthcheck-client
# kubeadm certs renew etcd-peer
# kubeadm certs renew etcd-server

重新检查确定新的证书已经生效，重启容器和kubelet，检查发现问题件已经解决。

产生这个问题的原因是，master01已经自动更新了证书，保证容器环境正常使用，但其他master节点的证书没有自动更新，需要手工处理一下，估计未来版本会解决这个问题。

删除AD用户重建后AAD Connect同步错误

Posted on 2021-11-19

问题背景

我们订阅了Micorsoft 365，并通过Azure AD Connect将本地的AD的用户同步到Azure AD中。一次误操作将一个用户从AD删除，在同步为执行前，立刻重新创建了正用户。导致该用户无法同步了。

问题原因分析

Azure AD中的用户通过Azure AD Connect同步后，会将AD中用户的属性sourceAnchor标记到Azure AD Connect的属性immutableId，用来标识两方的用户。

sourceAnchor有很多中选择，可以参考： Azure AD Connect：设计概念

在目前的版本中，系统将ConsistencyGuid作为sourceAnchor属性，老版本默认使用ObjectGuid。

而ConsistencyGuid一般是和ObjectGuid相同，当重新创建一个用户时，ObjectGuid就会发生改变，最终导致在同步时，AD中的ConsistencyGuid和Azure AD的immutableId不同，但用户有相同的UPN（userPrincipalName），导致同步冲突。

如果删除Azure AD的中用户，当然可以简单解决，但也会导致该用户的邮件等全部丢失。

解决方法

了解了问题的原因，我们就容易解决。思路是将AD中的ConsistencyGuid和Azure AD的immutableId进行统一。

查询Azure AD的immutableId

PS C:\>  Get-MsolUser -UserPrincipalName user1@abc.com | Select-Object UserprincipalName,ImmutableID

UserPrincipalName         ImmutableId
-----------------         -----------
user1@abc.com             K0fuEdJZe0ulRQq3+WlZTA==

可以看到该用户的immutableId值为K0fuEdJZe0ulRQq3+WlZTA==

转换immutableId的值
immutableId的值为Base64编码的十六进制，需要进行转换，可以用在线工具进行处理，输出结果需要选择十六进制。

通过转换我们得到的值类似： 2B 47 EE 11 D2 59 7B 4B A5 45 0A B7 F9 69 59 4C
修改AD中的ConsistencyGuid
登录AD服务器，打开ADSI编辑器，找到该用户，点开属性，找到属性ms-DS-ConsistencyGuid修改值为上步转换的值。
同步
可以等待同步，也可以手工同步。同步后确定结果是否正确。

后续补充

在本次问题解决中，也尝试过，去修改Azure AD的immutableId，感觉这个更合理一些。但在修改过程中遇到2个问题：

如果Azure AD中用户的状态是同步状态，这个值是无法修改的，必须等这个用户变成非同步用户或者关闭同步
我从测试中修改过Azure AD的immutableId，然后通过手工同步发现依旧有错，后续没有再测试。理论上这个方法应该可以

记录一下两个命令：

修改Azure AD的immutableId

1	Get-MsolUser -UserPrincipalName user1@abc.com \| Set-MsolUser -ImmutableId L0b1Dn3oIkGiFLPW9fhY+Q==

关闭同步
1
Set-MsolDirSyncEnabled -EnableDirSync $false
注意： 官方文档说明修改后如果要重新打开需要等待72小时。

另外，网上找到有人说可以将Azure AD Connect卸载后，重新安装来解决，其思路也是通过重新对应AD和Azure AD的用户，但这个方法我试过，问题没有解决。

最后，本次误操作的最好解决办法是，打开AD的回收站功能，具体在AD的管理中心进行修改，这样就可以完全避免这种问题了。

PowerShell安装MSOnline模块错误

Posted on 2021-11-17

在Windows Server 2016上安装MSOnline模块时提示错误，具体错误如下：

PS C:\> Install-Module MSOnline
需要使用 NuGet 提供程序来继续操作
PowerShellGet 需要使用 NuGet 提供程序“2.8.5.201”或更高版本来与基于 NuGet 的存储库交互。必须在“C:\Program
Files\PackageManagement\ProviderAssemblies”或“C:\Users\administrator.WAFERSYSTEMS1\AppData\Local\PackageManagement\ProviderAssemblies”中提供 NuGet 提供程序。也可以通过运行
'Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force' 安装 NuGet 提供程序。是否要让 PowerShellGet 立即安装并导入 NuGet 提供程序?
[Y] 是(Y)  [N] 否(N)  [S] 暂停(S)  [?] 帮助 (默认值为“Y”): y
警告: 无法从 URI“https://go.microsoft.com/fwlink/?LinkID=627338&clcid=0x409”下载到“”。
警告: 无法下载可用提供程序列表。请检查 Internet 连接。
PackageManagement\Install-PackageProvider : 找不到提供程序“NuGet”的指定搜索条件的匹配项。程序包提供程序要求 "PackageManagement" 和 "Provider" 标记。请检查指定的程序包是否具有标记。
所在位置 C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7405 字符: 21
+ ...     $null = PackageManagement\Install-PackageProvider -Name $script:N ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (Microsoft.Power...PackageProvider:InstallPackageProvider) [Install-PackageProvider]，Exception
    + FullyQualifiedErrorId : NoMatchFoundForProvider,Microsoft.PowerShell.PackageManagement.Cmdlets.InstallPackageProvider

PackageManagement\Import-PackageProvider : 未找到与指定搜索条件和提供程序名称“NuGet”匹配的项目。请尝试运行 'Get-PackageProvider -ListAvailable' 以查看系统中是否存在该提供程序。
所在位置 C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7411 字符: 21
+ ...     $null = PackageManagement\Import-PackageProvider -Name $script:Nu ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (NuGet:String) [Import-PackageProvider]，Exception
    + FullyQualifiedErrorId : NoMatchFoundForCriteria,Microsoft.PowerShell.PackageManagement.Cmdlets.ImportPackageProvider

警告: 无法从 URI“https://go.microsoft.com/fwlink/?LinkID=627338&clcid=0x409”下载到“”。
警告: 无法下载可用提供程序列表。请检查 Internet 连接。
PackageManagement\Get-PackageProvider : 找不到程序包提供程序“NuGet”。可能尚未导入该提供程序。请尝试使用 'Get-PackageProvider -ListAvailable'。
所在位置 C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:7415 字符: 30
+ ... tProvider = PackageManagement\Get-PackageProvider -Name $script:NuGet ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (Microsoft.Power...PackageProvider:GetPackageProvider) [Get-PackageProvider], Exception
    + FullyQualifiedErrorId : UnknownProviderFromActivatedList,Microsoft.PowerShell.PackageManagement.Cmdlets.GetPackageProvider

Install-Module : 需要使用 NuGet 提供程序来与基于 NuGet 的存储库交互。请确保已安装 NuGet 提供程序“2.8.5.201”或更高版本。
所在位置 行:1 字符: 1
+ Install-Module msonline
+ ~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Install-Module]，InvalidOperationException
    + FullyQualifiedErrorId : CouldNotInstallNuGetProvider,Install-Module

从提示来看，是由于NuGet安装时错误，经过查询，发现使用由于PowerShell默认没有使用TLS1.2导致，强制指定一下，然后很执行就可以了。指定方法：

1	[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12;

Kubernetes升级1.21问题汇总

Posted on 2021-09-10 In k8s

将Kubernetes从1.20升级到1.21时，遇到了两个问题。

`Core DNS` `image`下载失败

通过以下命令进行image下载：

> kubeadm config images pull --image-repository registry.aliyuncs.com/google_containers     
I0910 08:00:20.155181   16619 version.go:254] remote version is much newer: v1.22.1; falling back to: stable-1.21
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.21.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.21.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.21.4
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.4.1
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.4.13-0
failed to pull image "registry.aliyuncs.com/google_containers/coredns:v1.8.0": output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher

registry.aliyuncs.com/google_containers/coredns:v1.8.0一直下载不成功，经过检查发现，该镜像的tag为1.8.0不是v1.8.0，我们直接手工下载，重新设置tag即可：

对于docker运行时方法如下：

1 2	docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0 docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0

containerd方法如下：

1 2	crictl pull registry.aliyuncs.com/google_containers/coredns:1.8.0 ctr -n k8s.io i tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0

然后在正常升级就可以了。

`Failed to start ContainerManager`异常

升级完成后，发现工作节点上，kubelet启动异常，如下：

1	Failed to start ContainerManager failed to initialise top level QOS containers

解决办法：

1 2	systemctl stop kubepods-burstable.slice systemctl restart kubelet

注意，这样会导致部署在该节点的所有Pod重启，重启后就正常了。

vCenter升级过程中因密码过期导致的问题处理

Posted on 2021-02-19

今天在升级vCenter 7.0时，升级检查时提示错误，错误日志显示如下：

1
2
3

2021-02-19T06:12:41.374Z - debug: initiateFileTransferFromGuest error: ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.
2021-02-19T06:12:41.374Z - debug: Failed to get fileTransferInfo:ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.
2021-02-19T06:12:41.374Z - debug: Failed to get url of file in guest vm:ServerFaultCode: Failed to authenticate with the guest operating system using the supplied credentials.

经过检查发现是vCenter的root密码过期导致，ssh登录vCenter，进入shell，执行：

root@vcsa [ ~ ]# chage -l root
You are required to change your password immediately (root enforced)
chage: PAM: Authentication token is no longer valid; new one required

说明root的密码过期，修改密码即可：

root@vcsa [ ~ ]# passwd
New password: 
Retype new password: 
passwd: password updated successfully

然后重新执行升级程序检查，通过。

Kubernetes证书过期

Posted on 2020-12-28 In k8s

今天访问Kubernetes时得到如下错误：

1 2	> kubectl get node error: You must be logged in to the server (Unauthorized)

昨天还正常今天无法访问，怀疑是证书到期了，可以直接看看master上的证书文件，文件位于/etc/kubernetes/pki中，执行命令：

> for item in `find /etc/kubernetes/pki -maxdepth 2 -name "*.crt"`;do openssl x509 -in $item -text -noout| grep Not;echo ======================$item===============;done

            Not Before: Dec 27 07:18:44 2019 GMT
            Not After : Dec 24 07:18:44 2029 GMT
======================/etc/kubernetes/pki/ca.crt===============
            Not Before: Dec 27 07:18:44 2019 GMT
            Not After : Dec 22 23:43:16 2021 GMT
======================/etc/kubernetes/pki/apiserver.crt===============
            Not Before: Dec 27 07:18:44 2019 GMT
            Not After : Dec 22 23:43:17 2021 GMT
======================/etc/kubernetes/pki/apiserver-kubelet-client.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 24 07:18:45 2029 GMT
======================/etc/kubernetes/pki/front-proxy-ca.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 22 23:43:17 2021 GMT
======================/etc/kubernetes/pki/front-proxy-client.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 24 07:18:45 2029 GMT
======================/etc/kubernetes/pki/etcd/ca.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 22 23:42:44 2021 GMT
======================/etc/kubernetes/pki/etcd/server.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 22 23:42:44 2021 GMT
======================/etc/kubernetes/pki/etcd/peer.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 22 23:42:45 2021 GMT
======================/etc/kubernetes/pki/etcd/healthcheck-client.crt===============
            Not Before: Dec 27 07:18:45 2019 GMT
            Not After : Dec 22 23:43:17 2021 GMT
======================/etc/kubernetes/pki/apiserver-etcd-client.crt===============

查看后发现的确到期，那么我们renew证书即可：

> kubeadm alpha certs renew all
Command "all" is deprecated, please use the same command under "kubeadm certs"
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

从提示来看这个命令已经不建议使用，未来会使用kubeadm certs，这次就先这样，下次可以试试这个命令。

执行完成后，将/etc/kubernetes/admin.conf复制到~/.kube/config，就可以正常使用了。我没有重启kube-apiserver, kube-controller-manager, kube-scheduler and etcd前已经可以连接了，安全起见，还是重启一下。

可以连接后，也可以通过k8s命令查看证书状态：

> kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Dec 28, 2021 00:06 UTC   364d                                    no      
apiserver                  Dec 28, 2021 00:06 UTC   364d            ca                      no      
apiserver-etcd-client      Dec 28, 2021 00:06 UTC   364d            etcd-ca                 no      
apiserver-kubelet-client   Dec 28, 2021 00:06 UTC   364d            ca                      no      
controller-manager.conf    Dec 28, 2021 00:06 UTC   364d                                    no      
etcd-healthcheck-client    Dec 28, 2021 00:06 UTC   364d            etcd-ca                 no      
etcd-peer                  Dec 28, 2021 00:06 UTC   364d            etcd-ca                 no      
etcd-server                Dec 28, 2021 00:06 UTC   364d            etcd-ca                 no      
front-proxy-client         Dec 28, 2021 00:06 UTC   364d            front-proxy-ca          no      
scheduler.conf             Dec 28, 2021 00:06 UTC   364d                                    no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 24, 2029 07:18 UTC   8y              no      
etcd-ca                 Dec 24, 2029 07:18 UTC   8y              no      
front-proxy-ca          Dec 24, 2029 07:18 UTC   8y              no

升级Kubernetes V1.20后，pvc无法创建问题解决

Posted on 2020-12-23 In k8s

今天升级Kubernetes到1.20，发现PVC创建时一直处于pending状态，经过检查发现，nfs-client-provisioner日志有如下错误：

1	provision "test/test-sql" class "nfs-storage": unexpected error getting claim reference: selfLink was empty, can't make reference

经过查找发现这个是V1.10之后的配置修改，具体参考： https://github.com/kubernetes/enhancements/issues/1164

找到原因，直接修改/etc/kubernetes/manifests/kube-apiserver.yaml，增加参数：

1	- --feature-gates=RemoveSelfLink=false

Kubernetes加入containerd运行时节点

Posted on 2020-12-22 In k8s

Kubernetes到1.20开始不建议使用docker作为运行时，作为容器运行时，其实还有其他一些：

containerd
CRI-O

经过对比，我选择了containerd进行验证。这次我并没有升级整体Kubernetes集群，只是打算增加一个使用containerd的容器运行时的节点。

安装`containerd`

我使用CentOS 7，首先需要设置内核参数，这个和之前Docker一样：

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# 设置必需的 sysctl 参数，这些参数在重新启动后仍然存在。
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

然后开始安装，containerd使用yum源和docker-ce相同：

# 安装 containerd
## 设置仓库
### 安装所需包
sudo yum install -y yum-utils device-mapper-persistent-data lvm2

### 新增 Docker 仓库
sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo

1 2	## 安装 containerd sudo yum update -y && sudo yum install -y containerd.io

配置`containerd`

首先获得缺省配置：

1
2
3

# 配置 containerd
sudo mkdir -p /etc/containerd
sudo containerd config default > /etc/containerd/config.toml

因为国内无法访问部分image源，同事docker hub比较慢，修改使用国内镜像：

vi /etc/containerd/config.toml

# 修改一下内容：
...
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2"
...
    [plugins."io.containerd.grpc.v1.cri".registry]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://docker.mirrors.ustc.edu.cn"]
...

1 2	# 重启 containerd sudo systemctl restart containerd

加入k8s集群

首先安装k8s，包括kubeadm、kubelet、kubectl

1	yum install -y kubeadm kubelet kubectl

然后加入到集群：

1	kubeadm join master.k8s:8443 --token xxx --discovery-token-ca-cert-hash sha256:xxxxxxx

现在已经使用containerd允许时的节点已经加入到集群，部署容器测试可以正常使用。

containerd管理

使用containerd已经不能使用熟悉的docker命令进行容器的管理了，containerd提供了2个工具可以使用:

ctr
crictl

ctr

ctr是一个简单的命令工具，使用并不复杂，但需要注意：

ctr不会使用配置文件/etc/containerd/config.toml，也就是说配置的mirror并不能使用
images也有命名空间，k8s会使用一个名为k8s.io的命名空间
ctr的参数有顺序，如ctr -n=k8s.io images list正确，而ctr images list -n=k8s.io则不正确

希望未来ctr会更完善一些。

crictl

crictl使用和docker命令类似，比较方便。使用前需要增加配置文件：/etc/crictl.yaml。内容如下：

runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false

Kubernetes V1.18 ipvs故障

Posted on 2020-04-27 In k8s

升级Kubernetes到1.18时发现1个严重的问题：

首先是发现部分Services无法访问，经过各种检查最终发现当Pod重启后，就无法访问。

搭建一个测试的Deployment和Service，进行问题排查：Deployment如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hostnames
spec:
  selector:
    matchLabels:
      app: hostnames
  replicas: 1
  template:
    metadata:
      labels:
        app: hostnames
    spec:
      containers:
      - name: hostnames
        image: mirrorgooglecontainers/serve_hostname
        ports:
        - containerPort: 9376
          protocol: TCP

Service如下：

apiVersion: v1
kind: Service
metadata:
  name: hostnames
spec:
  selector:
    app: hostnames
  ports:
  - name: default
    protocol: TCP
    port: 80
    targetPort: 9376

再启动一个curl进行测试：

1	kubectl run curl --image=radial/busyboxplus:curl -it

正确情况如下：

1 2	[ root@curl-69c656fd45-ztblt:/ ]$ curl hostnames hostnames-dd4cc9dd9-5k42b

可以通过Service正常访问到Pod。

现在删除掉hostnames的Pod，等到Pod运行正常时，再次执行：

1 2	[ root@curl-69c656fd45-ztblt:/ ]$ curl hostnames connect to hostnames failed: No route to host

通过检查确定DNS没有问题，通过Pod的IP，可以正常访问，通过Service的IP就无法访问了。也就是说，升级Kubernetes V1.18(我的版本是V1.18.2)，就会导致Pod重新启动后，Service无法访问。

那么现在重点是Proxy的检查，我使用的是模式是ipvs，检查ipvs列表。
在master或其他节点上执行:

[root@node01 ~]# ipvsadm -L |grep -A 5 10.103.122.243
TCP  10.103.122.243:http rr
  -> 10.244.10.237:9376           Masq    1      0          0         
TCP  10.103.123.173:webcache rr
  -> 10.244.9.26:webcache         Masq    1      0          0         
TCP  10.103.143.242:6379 rr
  -> 10.244.9.229:6379            Masq    1      0          0

10.103.122.243为Service的IP，而转发地址10.244.10.237:9376是老的Pod的地址，不是新Pod的地址，这样肯定无法访问。再次验证一下这个原因，将转发修改正确:

1
2
3

[root@node01 ~]# ipvsadm -D -t 10.103.122.243:http 

[root@node01 ~]# ipvsadm -a -t 10.103.122.243:http -r 10.244.10.238:9376 -m

第一句是删除老的转发，第二句是新增正确的转发。其中10.244.10.238是新的Pod的的地址

进入curl容器执行：

1 2	[ root@curl-69c656fd45-ztblt:/ ]$ curl hostnames hostnames-dd4cc9dd9-5k42b

看到结果已经正常。这个确定了导致这个问题的原因是ipvs规则没有更新。经过确定，需要升级Liunx的内核到V4以上。

升级Kubernetes到1.16

Posted on 2019-10-16 In k8s

升级Kubernetes到1.16时发现2个问题：

执行kubeadm upgrade plan 提示CoreDNS插件失败，解决办法：
修改CoreDNS配置文件：
script
1
kubectl edit -n kube-system configmaps coredns
删除：proxy . /etc/resolv.conf
然后执行就可以成功。

升级后Node节点状态为Not Ready
检查日志：

script

1	journalctl -f -u kubelet.service

结果：

script

1
2
3

10月 16 11:46:21 k8s-master kubelet[11043]: E1016 11:46:21.515027   11043 kubelet.go:2187] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
10月 16 11:46:26 k8s-master kubelet[11043]: W1016 11:46:26.201162   11043 cni.go:202] Error validating CNI config &{cbr0  false [0xc0004a08a0 0xc0004a0940] [123 10 32 32 34 110 97 109 101 34 58 32 34 99 98 114 48 34 44 10 32 32 34 112 108 117 103 105 110 115 34 58 32 91 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 102 108 97 110 110 101 108 34 44 10 32 32 32 32 32 32 34 100 101 108 101 103 97 116 101 34 58 32 123 10 32 32 32 32 32 32 32 32 34 104 97 105 114 112 105 110 77 111 100 101 34 58 32 116 114 117 101 44 10 32 32 32 32 32 32 32 32 34 105 115 68 101 102 97 117 108 116 71 97 116 101 119 97 121 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 44 10 32 32 32 32 123 10 32 32 32 32 32 32 34 116 121 112 101 34 58 32 34 112 111 114 116 109 97 112 34 44 10 32 32 32 32 32 32 34 99 97 112 97 98 105 108 105 116 105 101 115 34 58 32 123 10 32 32 32 32 32 32 32 32 34 112 111 114 116 77 97 112 112 105 110 103 115 34 58 32 116 114 117 101 10 32 32 32 32 32 32 125 10 32 32 32 32 125 10 32 32 93 10 125 10]}: [plugin flannel does not support config version ""]
10月 16 11:46:26 k8s-master kubelet[11043]: W1016 11:46:26.201211   11043 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d

修改flannel配置：

script

1	vi /etc/cni/net.d/10-flannel.conflist

在"name": "cbr0"前增加：

script

1	"cniVersion": "0.3.1",

稍等片刻node自动恢复。

GitLab Pages在Kubernetes中进行Access Control

Posted on 2019-09-03 In k8s

最近在Kubernetes中部署了GitLab，GitLab使用HTTP，在Kubernetes通过Ingress进行HTTPS代理，对外使用HTTPS访问。

当在部署GitLab Pages时，系统通过Access Contrel进行访问控制时，结果发现出现503错误。经过多次尝试，发现有两个解决方法：

全部不使用HTTPS，使用HTTP。这个方法简单粗暴，但不安全。
修改修改配置如下：

# external_url设置为https，如果设置http，在通过Ingress代理成https，大部分功能没有问题，但发现Web IDE打开是因为在HTTPS请求中使用了HTTP大致失败，未来有可能GitLab会修改这个问题。但目前只能如此。
external_url 'https://gitlab.example.cn'

# Pages的对外页面，这里使用HTTP，通过Ingress进行HTTPS代理
pages_external_url "http://example.cn/"

# 如果使用容器部署，inplace_chroot需要打开
gitlab_pages['inplace_chroot'] = true

# 接入控制开关
gitlab_pages['access_control'] = true

# auth_server是关键，缺省和external_url相同，因为external_url设置为https，导致pages无法通过证书验证，所有修改认证为http方式。
gitlab_pages['auth_server'] = 'http://gitlab.rd.example.cn'

终于，GitLab中Web IDE和Pages都可以完美工作了。

备注： GitLab使用版本 V11~`V12.2`

概述

什么是 Claude Code？

升级过程

1. 项目分析

2. 版本调研

3. 执行升级

3.1 更新 package.json

3.2 安装依赖

3.3 配置迁移

3.4 测试构建

3.5 本地预览

4. 提交和推送

使用体验

优点

不足

升级结果

总结

相关链接

概述

前提条件

使用Helm安装Cert-Manager

ACME

HTTP01

DNS01

ACME Issuer

创建ClusterIssuer

为kubernetes的Ingress创建证书

问题

分析

问题背景

问题原因分析

解决方法

后续补充

Core DNS image下载失败

Failed to start ContainerManager异常

安装containerd

配置containerd

加入k8s集群

containerd管理

ctr

crictl

`Core DNS` `image`下载失败

`Failed to start ContainerManager`异常

安装`containerd`

配置`containerd`