夜深人静,跟友人聊天正好聊到CPU之类的硬件话题,突然就想起自己之前在Geekbench网站上做过的一些跑分测试结果,就想把这堆测试结果数据拉出来分享,正好又突然想起最近话题性非常火爆的ChatGPT,于是就想让这个人工智障帮我把我之前在Geekbench上做的测试结果爬下来做成表格,于是就有了这次交互,废话不多说,全过程展示如下。
我:请帮我爬取以下网页中的数据并制成表格:https://browser.geekbench.com/user/41254
ChatGPT:很抱歉,作为一个被训练出来的语言模型,我无法直接爬取网页上的数据。您可以使用编程语言,如 Python,来实现爬取数据的任务。
以下是使用 Python 库 BeautifulSoup 进行爬取的示例代码:
import requests from bs4 import BeautifulSoup url = 'https://browser.geekbench.com/user/41254' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') rows = table.find_all('tr') for row in rows: cols = row.find_all('td') print([col.text.strip() for col in cols])
该代码将爬取网页中的表格数据,并将其以列表形式打印出来。您可以根据需要进行修改,以达到您想要的效果。
这第一回合,我就有点懵了,倒不是惊叹于它写代码的能力,这本来也不是一段多么复杂的代码,我惊叹的是它不能帮我去爬网站数据我可以理解,但做完声明马上反手甩我一段Python爬虫是几个意思?我该夸它聪明还是骂它鸡贼?这是在教我做事?甩完代码还不忘附加一段简单的功能说明?我能轻易让这个人工智障给教育了?对!我能!于是我很没出息的复制粘贴了以上这段代码在我的机器上运行一遍,竟然一次通过了,运行结果如下:
['19004445', 'ASUSTeK COMPUTER INC. ROG Flow Z13 GZ301ZE_GZ301ZE\n\nIntel Core i9-12900H\n2892 MHz\n(14 cores)', 'Windows', 'x64', '1849', '10510']
['18996710', 'iPad13,16\n\nARM\n3197 MHz\n(8 cores)', 'iOS', 'AArch64', '1687', '6739']
['18996633', 'iPhone 13\n\nApple A15 Bionic\n3230 MHz\n(6 cores)', 'iOS', 'AArch64', '1742', '4758']
['18995993', 'MacBook Pro (16-inch, 2021)\n\nApple M1 Pro\n3217 MHz\n(10 cores)', 'macOS', 'AArch64', '1761', '12431']
['9108934', 'ASUSTek PC\n\nIntel Xeon E3-1230V2\n3308 MHz\n(4 cores)', 'Windows', 'x64', '773', '3171']
['5397211', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n2701 MHz\n(4 cores)', 'Windows', 'x64', '901', '3511']
['5092577', 'MacBook Air (Late 2020)\n\nApple M1\n3197 MHz\n(8 cores)', 'macOS', 'AArch64', '1731', '6684']
['4959033', 'AZW Z83-V\n\nIntel Atom x5-Z8350\n1920 MHz\n(4 cores)', 'Linux', 'x64', '202', '640']
['4958995', 'ASRock Matrix x86 Server\n\nIntel Xeon E5-2683 v3\n3000 MHz\n(14 cores)', 'Linux', 'x64', '728', '8671']
['4937755', 'Dell Inc. Inspiron 2350\n\nIntel Core i5-4200M\n3100 MHz\n(2 cores)', 'Linux', 'x64', '809', '1715']
['4873669', 'Dell Inc. Precision 5520\n\nIntel Core i7-6820HQ\n3600 MHz\n(4 cores)', 'Linux', 'x64', '951', '3771']
['4823809', 'MacBook (Early 2016)\n\nIntel Core m3-6Y30\n1100 MHz\n(2 cores)', 'macOS', 'x64', '530', '1169']
别说看起来还真是那么回事……而且一次运行就通过了,怎么还有点小满意?但是我又想了一下,决定再多考验考验这只铁憨憨,这个输出正好离我的最终需求还有一点点距离,我想让它直接给我输出Excel表格,这样我就可以直保存或是用于分享了。于是又有了以下的交互。
我:请帮我调整以上你提供的代码,实现直接将结果输出至Excel表格。
ChatGPT:您可以使用 Python 库 pandas 实现该需求,以下是代码示例:
import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://browser.geekbench.com/user/41254' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') rows = table.find_all('tr') data = [] for row in rows: cols = row.find_all('td') data.append([col.text.strip() for col in cols]) df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"]) df.to_excel('result.xlsx', index=False)
运行上述代码后,会在当前目录下生成一个名为 result.xlsx 的 Excel 文件,该文件即为您需要的结果。
诶?依然是这个一丝不苟的风格,甩完代码不忘附加功能说明。行,今天我就没出息到底了,继续复制、粘贴、运行!但是!哈哈哈!!!但是!这次运行就不那么顺利了,它报错了!
Traceback (most recent call last):
File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
columns = _validate_or_indexify_columns(contents, columns)
File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
raise AssertionError(
AssertionError: 4 columns passed, passed data had 6 columns
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/louis/test.py", line 18, in <module>
df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"])
File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__
arrays, columns, index = nested_data_to_arrays(
File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
content, columns = _finalize_columns_and_data(arr, columns, dtype)
File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
raise ValueError(err) from err
ValueError: 4 columns passed, passed data had 6 columns
这下这个智障终于暴露了真实水平?哼!得意之余,我决定进一步羞辱它!于是又有了以下的交互。
我:你提供的代码运行后报错如下:
Traceback (most recent call last): File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data columns = _validate_or_indexify_columns(contents, columns) File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns raise AssertionError( AssertionError: 4 columns passed, passed data had 6 columns The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/louis/test.py", line 18, in <module> df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute"]) File "/Library/Python/3.9/site-packages/pandas/core/frame.py", line 746, in __init__ arrays, columns, index = nested_data_to_arrays( File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays arrays, columns = to_arrays(data, columns, dtype=dtype) File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays content, columns = _finalize_columns_and_data(arr, columns, dtype) File "/Library/Python/3.9/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data raise ValueError(err) from err ValueError: 4 columns passed, passed data had 6 columns
请帮忙修正这个错误。
ChatGPT:这个错误是由于您设置的列数和实际数据的列数不一致导致的。
在代码中,您设置的列名为 [“Device”, “Model”, “Score”, “Compute”],而实际数据的列数却不止这四列,导致该错误。
您可以通过以下方式修正该错误:
- 检查网页的 HTML 源代码,确认实际数据的列数。
- 根据实际数据的列数,更新代码中的列名列表。
以下是修正后的代码:
import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://browser.geekbench.com/user/41254' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') rows = table.find_all('tr') data = [] for row in rows: cols = row.find_all('td') data.append([col.text.strip() for col in cols]) df = pd.DataFrame(data, columns=["Device", "Model", "Score", "Compute", "Platform", "Architecture"]) df.to_excel('result.xlsx', index=False)
运行该代码,您将得到没有错误的结果。
这下我有点绷不住了,这是羞辱不成反而被它给反杀了么?分析看着没毛病啊!最后还这么自信的告诉我这次不会报错了?不行我要试一下,看看你的debug能力!结果再次运行它修正后的代码,果然通过了,我成功的拿到了我想要的Excel表格。。。夜已深,洗洗睡……