记一次与ChatGPT的交互

09 Feb 2023

Reading time ~3 minutes

夜深人静，跟友人聊天正好聊到CPU之类的硬件话题，突然就想起自己之前在Geekbench网站上做过的一些跑分测试结果，就想把这堆测试结果数据拉出来分享，正好又突然想起最近话题性非常火爆的ChatGPT，于是就想让这个人工智障帮我把我之前在Geekbench上做的测试结果爬下来做成表格，于是就有了这次交互，废话不多说，全过程展示如下。

我：请帮我爬取以下网页中的数据并制成表格：https://browser.geekbench.com/user/41254

ChatGPT：很抱歉，作为一个被训练出来的语言模型，我无法直接爬取网页上的数据。您可以使用编程语言，如 Python，来实现爬取数据的任务。

以下是使用 Python 库 BeautifulSoup 进行爬取的示例代码：
import requests
from bs4 import BeautifulSoup

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

for row in rows:
   cols = row.find_all('td')
   print([col.text.strip() for col in cols])
该代码将爬取网页中的表格数据，并将其以列表形式打印出来。您可以根据需要进行修改，以达到您想要的效果。

这第一回合，我就有点懵了，倒不是惊叹于它写代码的能力，这本来也不是一段多么复杂的代码，我惊叹的是它不能帮我去爬网站数据我可以理解，但做完声明马上反手甩我一段Python爬虫是几个意思？我该夸它聪明还是骂它鸡贼？这是在教我做事？甩完代码还不忘附加一段简单的功能说明？我能轻易让这个人工智障给教育了？对！我能！于是我很没出息的复制粘贴了以上这段代码在我的机器上运行一遍，竟然一次通过了，运行结果如下：

['19004445', 'ASUSTeK COMPUTER INC. ROG Flow Z13 GZ301ZE_GZ301ZE\n\nIntel Core i9-12900H\n2892 MHz\n(14 cores)', 'Windows', 'x64', '1849', '10510']
['18996710', 'iPad13,16\n\nARM\n3197 MHz\n(8 cores)', 'iOS', 'AArch64', '1687', '6739']
['18996633', 'iPhone 13\n\nApple A15 Bionic\n3230 MHz\n(6 cores)', 'iOS', 'AArch64', '1742', '4758']
['18995993', 'MacBook Pro (16-inch, 2021)\n\nApple M1 Pro\n3217 MHz\n(10 cores)', 'macOS', 'AArch64', '1761', '12431']
['9108934', 'ASUSTek PC\n\nIntel Xeon E3-1230V2\n3308 MHz\n(4 cores)', 'Windows', 'x64', '773', '3171']
['5397211', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n2701 MHz\n(4 cores)', 'Windows', 'x64', '901', '3511']
['5092577', 'MacBook Air (Late 2020)\n\nApple M1\n3197 MHz\n(8 cores)', 'macOS', 'AArch64', '1731', '6684']
['4959033', 'AZW Z83-V\n\nIntel Atom x5-Z8350\n1920 MHz\n(4 cores)', 'Linux', 'x64', '202', '640']
['4958995', 'ASRock Matrix x86 Server\n\nIntel Xeon E5-2683 v3\n3000 MHz\n(14 cores)', 'Linux', 'x64', '728', '8671']
['4937755', 'Dell Inc. Inspiron 2350\n\nIntel Core i5-4200M\n3100 MHz\n(2 cores)', 'Linux', 'x64', '809', '1715']
['4873669', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n3600 MHz\n(4 cores)', 'Linux', 'x64', '951', '3771']
['4823809', 'MacBook (Early 2016)\n\nIntel Core m3-6Y30\n1100 MHz\n(2 cores)', 'macOS', 'x64', '530', '1169']

别说看起来还真是那么回事……而且一次运行就通过了，怎么还有点小满意？但是我又想了一下，决定再多考验考验这只铁憨憨，这个输出正好离我的最终需求还有一点点距离，我想让它直接给我输出Excel表格，这样我就可以直保存或是用于分享了。于是又有了以下的交互。

我：请帮我调整以上你提供的代码，实现直接将结果输出至Excel表格。

ChatGPT：您可以使用 Python 库 pandas 实现该需求，以下是代码示例：
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

data = []
for row in rows:
   cols = row.find_all('td')
   data.append([col.text.strip() for col in cols])

df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
df.to_excel('result.xlsx', index=False)
运行上述代码后，会在当前目录下生成一个名为 result.xlsx 的 Excel 文件，该文件即为您需要的结果。

诶？依然是这个一丝不苟的风格，甩完代码不忘附加功能说明。行，今天我就没出息到底了，继续复制、粘贴、运行！但是！哈哈哈！！！但是！这次运行就不那么顺利了，它报错了！

Traceback (most recent call last):
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 4 columns passed, passed data had 6 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/louis/test.py", line 18, in <module>
    df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
  File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 6 columns

这下这个智障终于暴露了真实水平？哼！得意之余，我决定进一步羞辱它！于是又有了以下的交互。

我：你提供的代码运行后报错如下：

Traceback (most recent call last):
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
   columns = _validate_or_indexify_columns(contents, columns)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
   raise AssertionError(
AssertionError: 4 columns passed, passed data had 6 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/Users/louis/test.py", line 18, in <module>
   df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
 File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__
   arrays, columns, index = nested_data_to_arrays(
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
   arrays, columns = to_arrays(data, columns, dtype=dtype)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
   content, columns = _finalize_columns_and_data(arr, columns, dtype)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
   raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 6 columns

请帮忙修正这个错误。

ChatGPT：这个错误是由于您设置的列数和实际数据的列数不一致导致的。

在代码中，您设置的列名为 [“Device”, “Model”, “Score”, “Compute”]，而实际数据的列数却不止这四列，导致该错误。

您可以通过以下方式修正该错误：

检查网页的 HTML 源代码，确认实际数据的列数。
根据实际数据的列数，更新代码中的列名列表。

以下是修正后的代码：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

data = []
for row in rows:
   cols = row.find_all('td')
   data.append([col.text.strip() for col in cols])

df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute", "Platform", "Architecture"])
df.to_excel('result.xlsx', index=False)

运行该代码，您将得到没有错误的结果。

这下我有点绷不住了，这是羞辱不成反而被它给反杀了么？分析看着没毛病啊！最后还这么自信的告诉我这次不会报错了？不行我要试一下，看看你的debug能力！结果再次运行它修正后的代码，果然通过了，我成功的拿到了我想要的Excel表格。。。夜已深，洗洗睡……

计算机软件