• 首页
  • 关于
    • 北境程序猿 photo

      北境程序猿

    • 详细
    • Email
    • Twitter
    • Facebook
    • Google+
    • LinkedIn
    • Instagram
    • Github
    • StackOverflow
    • Steam
    • SoundCloud
    • Weibo
    • Hacker News
  • 文章
    • 所有文章
    • 所有标签

记一次与ChatGPT的交互

09 Feb 2023

Reading time ~3 minutes

夜深人静,跟友人聊天正好聊到CPU之类的硬件话题,突然就想起自己之前在Geekbench网站上做过的一些跑分测试结果,就想把这堆测试结果数据拉出来分享,正好又突然想起最近话题性非常火爆的ChatGPT,于是就想让这个人工智障帮我把我之前在Geekbench上做的测试结果爬下来做成表格,于是就有了这次交互,废话不多说,全过程展示如下。

我:请帮我爬取以下网页中的数据并制成表格:https://browser.geekbench.com/user/41254

ChatGPT:很抱歉,作为一个被训练出来的语言模型,我无法直接爬取网页上的数据。您可以使用编程语言,如 Python,来实现爬取数据的任务。

以下是使用 Python 库 BeautifulSoup 进行爬取的示例代码:

import requests
from bs4 import BeautifulSoup

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

for row in rows:
   cols = row.find_all('td')
   print([col.text.strip() for col in cols])

该代码将爬取网页中的表格数据,并将其以列表形式打印出来。您可以根据需要进行修改,以达到您想要的效果。

这第一回合,我就有点懵了,倒不是惊叹于它写代码的能力,这本来也不是一段多么复杂的代码,我惊叹的是它不能帮我去爬网站数据我可以理解,但做完声明马上反手甩我一段Python爬虫是几个意思?我该夸它聪明还是骂它鸡贼?这是在教我做事?甩完代码还不忘附加一段简单的功能说明?我能轻易让这个人工智障给教育了?对!我能!于是我很没出息的复制粘贴了以上这段代码在我的机器上运行一遍,竟然一次通过了,运行结果如下:

['19004445', 'ASUSTeK COMPUTER INC. ROG Flow Z13 GZ301ZE_GZ301ZE\n\nIntel Core i9-12900H\n2892 MHz\n(14 cores)', 'Windows', 'x64', '1849', '10510']
['18996710', 'iPad13,16\n\nARM\n3197 MHz\n(8 cores)', 'iOS', 'AArch64', '1687', '6739']
['18996633', 'iPhone 13\n\nApple A15 Bionic\n3230 MHz\n(6 cores)', 'iOS', 'AArch64', '1742', '4758']
['18995993', 'MacBook Pro (16-inch, 2021)\n\nApple M1 Pro\n3217 MHz\n(10 cores)', 'macOS', 'AArch64', '1761', '12431']
['9108934', 'ASUSTek PC\n\nIntel Xeon E3-1230V2\n3308 MHz\n(4 cores)', 'Windows', 'x64', '773', '3171']
['5397211', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n2701 MHz\n(4 cores)', 'Windows', 'x64', '901', '3511']
['5092577', 'MacBook Air (Late 2020)\n\nApple M1\n3197 MHz\n(8 cores)', 'macOS', 'AArch64', '1731', '6684']
['4959033', 'AZW Z83-V\n\nIntel Atom x5-Z8350\n1920 MHz\n(4 cores)', 'Linux', 'x64', '202', '640']
['4958995', 'ASRock Matrix x86 Server\n\nIntel Xeon E5-2683 v3\n3000 MHz\n(14 cores)', 'Linux', 'x64', '728', '8671']
['4937755', 'Dell Inc. Inspiron 2350\n\nIntel Core i5-4200M\n3100 MHz\n(2 cores)', 'Linux', 'x64', '809', '1715']
['4873669', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n3600 MHz\n(4 cores)', 'Linux', 'x64', '951', '3771']
['4823809', 'MacBook (Early 2016)\n\nIntel Core m3-6Y30\n1100 MHz\n(2 cores)', 'macOS', 'x64', '530', '1169']

别说看起来还真是那么回事……而且一次运行就通过了,怎么还有点小满意?但是我又想了一下,决定再多考验考验这只铁憨憨,这个输出正好离我的最终需求还有一点点距离,我想让它直接给我输出Excel表格,这样我就可以直保存或是用于分享了。于是又有了以下的交互。

我:请帮我调整以上你提供的代码,实现直接将结果输出至Excel表格。

ChatGPT:您可以使用 Python 库 pandas 实现该需求,以下是代码示例:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

data = []
for row in rows:
   cols = row.find_all('td')
   data.append([col.text.strip() for col in cols])

df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
df.to_excel('result.xlsx', index=False)

运行上述代码后,会在当前目录下生成一个名为 result.xlsx 的 Excel 文件,该文件即为您需要的结果。

诶?依然是这个一丝不苟的风格,甩完代码不忘附加功能说明。行,今天我就没出息到底了,继续复制、粘贴、运行!但是!哈哈哈!!!但是!这次运行就不那么顺利了,它报错了!

Traceback (most recent call last):
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 4 columns passed, passed data had 6 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/louis/test.py", line 18, in <module>
    df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
  File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 6 columns

这下这个智障终于暴露了真实水平?哼!得意之余,我决定进一步羞辱它!于是又有了以下的交互。

我:你提供的代码运行后报错如下:

Traceback (most recent call last):
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
   columns = _validate_or_indexify_columns(contents, columns)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
   raise AssertionError(
AssertionError: 4 columns passed, passed data had 6 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/Users/louis/test.py", line 18, in <module>
   df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
 File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__
   arrays, columns, index = nested_data_to_arrays(
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
   arrays, columns = to_arrays(data, columns, dtype=dtype)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
   content, columns = _finalize_columns_and_data(arr, columns, dtype)
 File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
   raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 6 columns

请帮忙修正这个错误。

ChatGPT:这个错误是由于您设置的列数和实际数据的列数不一致导致的。

在代码中,您设置的列名为 [“Device”, “Model”, “Score”, “Compute”],而实际数据的列数却不止这四列,导致该错误。

您可以通过以下方式修正该错误:

  1. 检查网页的 HTML 源代码,确认实际数据的列数。
  2. 根据实际数据的列数,更新代码中的列名列表。

以下是修正后的代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://browser.geekbench.com/user/41254'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')

data = []
for row in rows:
   cols = row.find_all('td')
   data.append([col.text.strip() for col in cols])

df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute", "Platform", "Architecture"])
df.to_excel('result.xlsx', index=False)

运行该代码,您将得到没有错误的结果。

这下我有点绷不住了,这是羞辱不成反而被它给反杀了么?分析看着没毛病啊!最后还这么自信的告诉我这次不会报错了?不行我要试一下,看看你的debug能力!结果再次运行它修正后的代码,果然通过了,我成功的拿到了我想要的Excel表格。。。夜已深,洗洗睡……




计算机软件 Share Tweet +1