问题：

读一本书。通过Python（Jupyter）从谷歌云存储中获取gz文件

汤兴生

2023-03-14

我在读一本书。通过Jupyter笔记本上的Python从谷歌云存储中获取gz文件。

我通过第一个代码得到错误。

TypeError：无法将str转换为字节

from google.cloud import storage
import pandas as pd
from io import StringIO

client = storage.Client()
bucket = client.get_bucket("nttcomware")
blob = bucket.get_blob(f"test.csv.gz")
df = pd.read_csv(s, compression='gzip', float_precision="high")
df.head()

我通过第二个代码得到第二个错误。

UnicodeDecodeError:“utf-8”编解码器无法解码位置1中的字节0x8b：无效的开始字节

from google.cloud import storage
import pandas as pd
from io import StringIO

client = storage.Client()
bucket = client.get_bucket("nttcomware")
blob = bucket.get_blob(f"test.csv.gz")
bt = blob.download_as_string()
s = str(bt, "utf-8")
s = StringIO(s)
df = pd.read_csv(s, compression='gzip', float_precision="high")
df.head()

请建议。

共有2个答案

仉宪

2023-03-14

这对我阅读json很有用。gz直接从地面军事系统传输到数据帧。

client = storage.Client()

def gcs_read_json_gz(gcs_filepath, nrows=None):
    
    # Validate input path
    if not gcs_filepath.startswith("gs://") or not gcs_filepath.endswith(".json.gz"):
        raise ValueError(F"Invalid path: {gcs_filepath}")

    # Get the bucket
    bucket_name = gcs_filepath.split("/")[2]
    bucket = client.get_bucket(bucket_name)

    # Get the blob object
    blob_name = "/".join(gcs_filepath.split("/")[3:])
    blob = bucket.get_blob(blob_name)

    # Convert blob into string and consider as BytesIO object. Still compressed by gzip
    data = io.BytesIO(blob.download_as_string())

    # Open gzip into csv
    with gzip.open(data) as gz:
        # Read compressed file as a file object
        file = gz.read()
        # Decode the byte type into string by utf-8
        blob_decompress = file.decode('utf-8')
        # StringIO object
        s = io.StringIO(blob_decompress)

    df = pd.read_json(s, precise_float="high", nrows=nrows, lines=True)

    return df

羊舌赞

2023-03-14

幸运的是我自己解决了。我希望这对其他人有帮助。

client = storage.Client()

# get the bucket
bucket = client.get_bucket("nttcomware")

# get the blob object
blob_name = "test.csv.gz"
blob = bucket.get_blob(blob_name)

# convert blob into string and consider as BytesIO object. Still compressed by gzip
data = io.BytesIO(blob.download_as_string())

# open gzip into csv
with gzip.open(data) as gz:
    #still byte type string
    file = gz.read()
    # erase the .gz extension and get the blob object
    blob_decompress = bucket.blob(blob_name.replace('.gz',''))
    # convert into byte type again
    blob_decompress = blob_decompress.download_as_string()
    # decode the byte type into string by utf-8
    blob_decompress = blob_decompress.decode('utf-8')
    # StringIO object
    s = StringIO(blob_decompress)
    

df = pd.read_csv(s, float_precision="high")
df.head()

类似资料：

谷歌应用引擎：从谷歌云存储阅读

null
从谷歌云存储下载文件

我在尝试从谷歌云存储下载CSV文件时遇到了一个问题。出于某种原因，它一直以字节而不是可读文本的形式下载文件。当我在Excel中打开下载的CSV时，Excel已经足够智能，可以将其转换为可读文本。我在这里错过了什么？我检查了谷歌的文档，但找不到任何好的信息来完成它们。提前谢谢你！这是错误：UnicodeDecodeError:“utf-8”编解码器无法解码位置15-16的字节：无效的连续字节
DirectPipelineRunner在数据流中读取从本地机器到谷歌云存储

我尝试运行一个数据流管道，使用DirectPipelineRunner从本地计算机（windows）读取数据，并写入Google云存储。作业失败，出现以下指定FileNotFoundException的错误（因此我认为数据流作业无法读取我的位置）。我正在本地计算机上运行作业，以运行我创建的基于GCP的模板。我可以在GCP数据流仪表板中看到它，但由于以下错误而失败。请帮忙。我还尝试了本地机器的IP或
从谷歌云存储读取csv到熊猫数据框

我试图读取一个csv文件目前在谷歌云存储桶到熊猫数据帧。它显示以下错误消息：我做错了什么，我无法找到任何不涉及谷歌数据实验室的解决方案？
如何通过GKE pods访问谷歌云存储中的文件

我正在尝试在我的节点中获取谷歌云存储（GCS）的图像文件。使用Axios客户端的js应用程序。在使用PC的开发模式下，我传递了一个承载令牌，所有这些都正常工作。但是，我需要在Google Kubernetes Engine（GKE）上托管的集群的生产中使用它。我做了推荐的教程来创建一个服务号（GSA），然后我vinculed与kubernetes帐户（KSA），通过工作负载身份方法，但当我尝试
谷歌BigQuery可以从谷歌Play开发者控制台读取谷歌云存储私有文件夹中的文件吗？

谷歌Play开发者控制台为我的应用程序提供了访问谷歌云存储文件夹的权限，该文件夹包含用户获取分析数据；谢谢，

读一本书。通过Python（Jupyter）从谷歌云存储中获取gz文件

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档