我或多或少地遵循这个例子,使用我自己的数据集将ray tune超参数库与huggingface变压器库集成在一起。
这是我的剧本:
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.examples.pbt_transformers.utils import download_data, \
build_compute_metrics_fn
from ray.tune.schedulers import PopulationBasedTraining
from transformers import glue_tasks_num_labels, AutoConfig, \
AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
def get_model():
# tokenizer = AutoTokenizer.from_pretrained(model_name, additional_special_tokens = ['[CHARACTER]'])
model = ElectraForSequenceClassification.from_pretrained('google/electra-small-discriminator', num_labels=2)
model.resize_token_embeddings(len(tokenizer))
return model
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
training_args = TrainingArguments(
"electra_hp_tune",
report_to = "wandb",
learning_rate=2e-5, # config
do_train=True,
do_eval=True,
evaluation_strategy="epoch",
load_best_model_at_end=True,
num_train_epochs=2, # config
per_device_train_batch_size=16, # config
per_device_eval_batch_size=16, # config
warmup_steps=0,
weight_decay=0.1, # config
logging_dir="./logs",
)
trainer = Trainer(
model_init=get_model,
args=training_args,
train_dataset=chunked_encoded_dataset['train'],
eval_dataset=chunked_encoded_dataset['validation'],
compute_metrics=compute_metrics
)
tune_config = {
"per_device_train_batch_size": 32,
"per_device_eval_batch_size": 32,
"num_train_epochs": tune.choice([2, 3, 4, 5])
}
scheduler = PopulationBasedTraining(
time_attr="training_iteration",
metric="eval_acc",
mode="max",
perturbation_interval=1,
hyperparam_mutations={
"weight_decay": tune.uniform(0.0, 0.3),
"learning_rate": tune.uniform(1e-5, 2.5e-5),
"per_device_train_batch_size": [16, 32, 64],
})
reporter = CLIReporter(
parameter_columns={
"weight_decay": "w_decay",
"learning_rate": "lr",
"per_device_train_batch_size": "train_bs/gpu",
"num_train_epochs": "num_epochs"
},
metric_columns=[
"eval_f1", "eval_loss", "epoch", "training_iteration"
])
from ray.tune.integration.wandb import WandbLogger
trainer.hyperparameter_search(
hp_space=lambda _: tune_config,
backend="ray",
n_trials=10,
scheduler=scheduler,
keep_checkpoints_num=1,
checkpoint_score_attr="training_iteration",
progress_reporter=reporter,
name="tune_transformer_gr")
最后一次函数调用(对trainer.hyperparameter_search)是在引发错误时进行的。错误消息是:
模块泡菜没有泡菜缓冲区属性
下面是完整的堆栈跟踪:
最近的最后一次调用
在()8 checkpoint\u score\u attr=“training\u iteration”中,9 progress\u reporter=reporter---
14帧
/usr/local/lib/python3。7/地区包/变压器/培训师。超参数搜索中的py(self、hp\u space、compute\u objective、n\u trials、direction、backend、hp\u name、**kwargs)1666 1667
如果backend==HPSearchBackend,则运行hp\u search=run\u hp\u search\u optuna。OPTUNA运行\u hp\u搜索\u光线-
/usr/local/lib/python3.7/dist-packages/transformers/integrations.pyrun_hp_search_ray(培训师,n_trials,方向,**kwargs)231 232分析=ray.tune.run(-
/usr/local/lib/python3.7/dist-packages/ray/tune/utils/trainable.pywith_parameters(可训练,**kwargs)294前缀=f"{str(可训练)}_"295表示k,v表示kwargs.items (): --
/usr/local/lib/python3。7/dist-packages/ray/tune/registry。py输入(self,k,v)160 self。如果光线充足,则刷新[k]=v161。_已初始化():--
/usr/local/lib/python3.7/dist-packages/ray/tune/registry.pyin flush(自我)169 def flush(自我): 170 for k, v inself.to_flush.items(): --
/usr/local/lib/python3。7/dist软件包/ray/_private/client_mode_hook。包装中的py(*args,**kwargs)45如果客户端模式应转换():46返回getattr(光线,函数名)(*args,**kwargs)---
/usr/local/lib/python3。7/数据包/射线/工人。py输入(值)
1512,带分析。配置文件(“ray.put”):1513尝试:-
/usr/local/lib/python3.7/dist-packages/ray/worker.py在put_object(自我,价值,object_ref)259"插入与ObjectRef")260-
/usr/local/lib/python3.7/dist-packages/ray/serialization.py在序列化(自我,值)322返回RawSerializedObject(值)323其他:--
/usr/local/lib/python3。7/dist包/ray/序列化。py in_serialize_to_msgpack(self,value)302 metadata=ray_常量。对象\u元数据\u类型\u PYTHON 303 pickle5\u序列化\u对象=
--
/usr/local/lib/python3。7/dist包/ray/序列化。py in_将_序列化为_pickle5(self、元数据、值)262,例外情况为e:263 self。获取和清除包含的对象--
/usr/local/lib/python3。7/dist包/ray/序列化。py in_将_序列化为_pickle5(self、元数据、值)259self。在带内设置带内序列化()260带内=pickle。垃圾场(--
/usr/local/lib/python3.7/dist-packages/ray/cloudpickle/cloudpickle_fast.py在转储(obj,协议,buffer_callback)71文件,协议=协议,buffer_callback=buffer_callback72)---
/usr/local/lib/python3.7/dist-packages/ray/cloudpickle/cloudpickle_fast.py在转储(自我, obj)578 def转储(自我, obj): 579 try:--
/usr/local/lib/python3。7/dist包/pyarrow/io。pyarrow中的pxi。自由党。缓冲器减少
模块泡菜没有泡菜缓冲区属性
我的环境设置:
我所尝试的:
这个错误是从哪里来的,我如何解决它?
我在尝试使用pickle时也犯了同样的错误。dump(),对我来说,将pickle5从版本0.0降级是有效的。11比0.0。10
当我使用: 我得到这个错误: 完整代码: 梯度提升分类器模型为:
我试图创建"CfnUserPoolResourceServer"的认知使用python代码。根据https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_cognito/CfnUserPoolResourceServer.html我试图设置“范围”参数。根据https://docs.aws.amazon.com/AWSCloudFo
为什么在尝试访问Twitter API时会出现以下错误? 属性错误回溯(最近调用最后一次)在()17 18--- AttributeError:“模块”对象没有属性“oauth” 我的代码:
我使用要连接到mysql,下面是我的Python语句: 但是有一个错误,这里是日志: 这是我的代码: 我已经创建数据库在谢谢
问题内容: 我有一个包含许多文件的目录。每个文件定义一些类。我的目录中也有一个空白。 例如: 我正在尝试导入和访问所有这些文件中定义的类: 这给我一个错误的说法,即没有属性。为什么?为什么我只能访问其中一个文件(),而不能访问其他文件? 问题答案: 问题是子模块不会自动导入。您必须显式导入模块: 如果您真的坚持要在导入时可用,则可以输入以下内容: 然后,这将按预期工作:
我运行在Py魅力:导入tensorflow作为tf 印刷品(tf.version) 我得到了错误:"C:\用户\Hoi Yee\anaconda3\envs\tenorflow 1\python.exe"C:/用户/Hoi Yee/PycharmProjects/untitled1/ss.py"回溯(最近的调用最后):文件"C:/用户/Hoi Yee/PycharmProjects/untitle