我正在改进我的Azure数据工厂印章,比较复制活动性能和映射数据流写入Azure Blob存储中的单个CSV文件。
当我通过Azure Blob存储链接服务(azureBlobLinkedService)通过数据集(azureBlobSingleCSVFileNameDataset)写入单个CSV时,使用复制活动在Blob存储容器中获得我期望的输出。例如,MyData的输出文件。csv在文件夹/output/csv/singleFiles下的容器MyContainer中。
当我通过同一个Blob存储链接服务,但通过不同的数据集(azureBlobSingleCSVNoFileNameDataset),使用映射数据流写入单个CSV时,我得到以下结果:
我不明白为什么在使用映射数据流时会生成零长度的文件。
以下是我的源文件:
linkedService/AzureBlockedService
{
"name": "azureBlobLinkedService",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"type": "AzureBlobStorage",
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
}
},
"annotations": [],
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "AzureKeyVaultLinkedService",
"type": "LinkedServiceReference"
},
"secretName": "@{linkedService().azureBlobConnectionStringSecretName}"
}
}
}
}
dataset/azureBlobSingleCSVFileNameDataset
{
"name": "azureBlobSingleCSVFileNameDataset",
"properties": {
"linkedServiceName": {
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
}
}
},
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
},
"azureBlobSingleCSVFileName": {
"type": "string"
},
"azureBlobSingleCSVFolderPath": {
"type": "string"
},
"azureBlobSingleCSVContainerName": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": {
"value": "@dataset().azureBlobSingleCSVFileName",
"type": "Expression"
},
"folderPath": {
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
},
"container": {
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
},
"type": "Microsoft.DataFactory/factories/datasets"
}
管道/AzureSQL表到Blob单个CSV复制管道(这将生成预期结果)
{
"name": "Azure SQL Table to Blob Single CSV Copy Pipeline",
"properties": {
"activities": [
{
"name": "Copy Azure SQL Table to Blob Single CSV",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"queryTimeout": "02:00:00"
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".csv"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference",
"parameters": {
"azureSqlDatabaseConnectionStringSecretName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
},
"azureSqlDatabaseTableSchemaName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
},
"azureSqlDatabaseTableTableName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "azureBlobSingleCSVFileNameDataset",
"type": "DatasetReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
},
"azureBlobSingleCSVFileName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFileName",
"type": "Expression"
},
"azureBlobSingleCSVFolderPath": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
},
"azureBlobSingleCSVContainerName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
}
}
}
]
}
],
"parameters": {
"sourceAzureSqlDatabaseConnectionStringSecretName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableSchemaName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableTableName": {
"type": "string"
},
"sinkAzureBlobConnectionStringSecretName": {
"type": "string"
},
"sinkAzureBlobSingleCSVContainerName": {
"type": "string"
},
"sinkAzureBlobSingleCSVFolderPath": {
"type": "string"
},
"sinkAzureBlobSingleCSVFileName": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Dataset/azureBobSingleCSVNoFileNameDataset:(映射数据流所需的数据集中没有文件名,在映射数据流中设置)
{
"name": "azureBlobSingleCSVNoFileNameDataset",
"properties": {
"linkedServiceName": {
"referenceName": "azureBlobLinkedService",
"type": "LinkedServiceReference",
"parameters": {
"azureBlobConnectionStringSecretName": {
"value": "@dataset().azureBlobConnectionStringSecretName",
"type": "Expression"
}
}
},
"parameters": {
"azureBlobConnectionStringSecretName": {
"type": "string"
},
"azureBlobSingleCSVFolderPath": {
"type": "string"
},
"azureBlobSingleCSVContainerName": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": {
"value": "@dataset().azureBlobSingleCSVFolderPath",
"type": "Expression"
},
"container": {
"value": "@dataset().azureBlobSingleCSVContainerName",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
},
"type": "Microsoft.DataFactory/factories/datasets"
}
dataflow/AzureSqlDatabaseTableToAzureBlobbSingleCsvDataflow
{
"name": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "azureSqlDatabaseTableDataset",
"type": "DatasetReference"
},
"name": "readFromAzureSqlDatabase"
}
],
"sinks": [
{
"dataset": {
"referenceName": "azureBlobSingleCSVNoFileNameDataset",
"type": "DatasetReference"
},
"name": "writeToAzureBlobSingleCSV"
}
],
"transformations": [
{
"name": "enrichWithRuntimeMetadata"
}
],
"script": "\nparameters{\n\tsourceConnectionSecretName as string,\n\tsinkConnectionStringSecretName as string,\n\tsourceObjectName as string,\n\tsinkObjectName as string,\n\tdataFactoryName as string,\n\tdataFactoryPipelineName as string,\n\tdataFactoryPipelineRunId as string,\n\tsinkFileNameNoPath as string\n}\nsource(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> readFromAzureSqlDatabase\nreadFromAzureSqlDatabase derive({__sourceConnectionStringSecretName} = $sourceConnectionSecretName,\n\t\t{__sinkConnectionStringSecretName} = $sinkConnectionStringSecretName,\n\t\t{__sourceObjectName} = $sourceObjectName,\n\t\t{__sinkObjectName} = $sinkObjectName,\n\t\t{__dataFactoryName} = $dataFactoryName,\n\t\t{__dataFactoryPipelineName} = $dataFactoryPipelineName,\n\t\t{__dataFactoryPipelineRunId} = $dataFactoryPipelineRunId) ~> enrichWithRuntimeMetadata\nenrichWithRuntimeMetadata sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tpartitionFileNames:[($sinkFileNameNoPath)],\n\tpartitionBy('hash', 1),\n\tquoteAll: true) ~> writeToAzureBlobSingleCSV"
}
}
}
pipeline/Azure SQL表到Blob单个CSV数据流管道(这会产生预期的结果,并在文件夹路径中添加零字节文件。)
{
"name": "Azure SQL Table to Blob Single CSV Data Flow Pipeline",
"properties": {
"activities": [
{
"name": "Copy Sql Database Table To Blob Single CSV Data Flow",
"type": "ExecuteDataFlow",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataflow": {
"referenceName": "azureSqlDatabaseTableToAzureBlobSingleCSVDataFlow",
"type": "DataFlowReference",
"parameters": {
"sourceConnectionSecretName": {
"value": "'@{pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName}'",
"type": "Expression"
},
"sinkConnectionStringSecretName": {
"value": "'@{pipeline().parameters.sinkAzureBlobConnectionStringSecretName}'",
"type": "Expression"
},
"sourceObjectName": {
"value": "'@{concat('[', pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName, '].[', pipeline().parameters.sourceAzureSqlDatabaseTableTableName, ']')}'",
"type": "Expression"
},
"sinkObjectName": {
"value": "'@{concat(pipeline().parameters.sinkAzureBlobSingleCSVContainerName, '/', pipeline().parameters.sinkAzureBlobSingleCSVFolderPath, '/', \npipeline().parameters.sinkAzureBlobSingleCSVFileName)}'",
"type": "Expression"
},
"dataFactoryName": {
"value": "'@{pipeline().DataFactory}'",
"type": "Expression"
},
"dataFactoryPipelineName": {
"value": "'@{pipeline().Pipeline}'",
"type": "Expression"
},
"dataFactoryPipelineRunId": {
"value": "'@{pipeline().RunId}'",
"type": "Expression"
},
"sinkFileNameNoPath": {
"value": "'@{pipeline().parameters.sinkAzureBlobSingleCSVFileName}'",
"type": "Expression"
}
},
"datasetParameters": {
"readFromAzureSqlDatabase": {
"azureSqlDatabaseConnectionStringSecretName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseConnectionStringSecretName",
"type": "Expression"
},
"azureSqlDatabaseTableSchemaName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableSchemaName",
"type": "Expression"
},
"azureSqlDatabaseTableTableName": {
"value": "@pipeline().parameters.sourceAzureSqlDatabaseTableTableName",
"type": "Expression"
}
},
"writeToAzureBlobSingleCSV": {
"azureBlobConnectionStringSecretName": {
"value": "@pipeline().parameters.sinkAzureBlobConnectionStringSecretName",
"type": "Expression"
},
"azureBlobSingleCSVFolderPath": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVFolderPath",
"type": "Expression"
},
"azureBlobSingleCSVContainerName": {
"value": "@pipeline().parameters.sinkAzureBlobSingleCSVContainerName",
"type": "Expression"
}
}
}
},
"compute": {
"coreCount": 8,
"computeType": "General"
}
}
}
],
"parameters": {
"sourceAzureSqlDatabaseConnectionStringSecretName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableSchemaName": {
"type": "string"
},
"sourceAzureSqlDatabaseTableTableName": {
"type": "string"
},
"sinkAzureBlobConnectionStringSecretName": {
"type": "string"
},
"sinkAzureBlobSingleCSVContainerName": {
"type": "string"
},
"sinkAzureBlobSingleCSVFolderPath": {
"type": "string"
},
"sinkAzureBlobSingleCSVFileName": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
获取0个长度(字节)文件的原因意味着,虽然管道可能已成功运行,但它没有返回或生成任何输出。
更好的方法之一是预览每个阶段的输出,以确保每个阶段都有预期的输出。
我们有一个将文件复制到Azure文件服务器的外部源。文件大小约为10GB。我想在Azure文件服务器上使用Azure Data Factory完成文件复制后,立即将此文件复制到Azure Blob存储。供应商无法将此文件复制到Blob容器。有人能帮我配置什么类型的触发器吗。我可以手动复制,但我正在寻找是否可以实现自动化。我甚至不能安排这个活动,因为来自外部源的文件副本是随机的。 谢谢
如何将 avro 文件从 Blob 存储加载到 Azure 数据工厂 移动数据流?我正在尝试加载,但无法导入架构和预览。我在 Blob 中的 avro 文件是事件中心捕获函数的结果。我必须使用 Azure 数据工厂的移动数据流将数据从 Azure blob 移动到 Azure sql db。
我正在尝试使用将从API响应中获取的数据存储到CSV文件中。这就是我的代码的样子: 当我试图执行这个测试时,我得到一个错误:。也许,这是因为JSON的格式,它看起来像: 之后,会有一个包含对象的数组。我只想只存储这些数据。我还尝试了一些作为提取响应的更改: 但是当我在
问题内容: 我在名为bot4CA.py的模块上使用cProfile,因此在控制台中键入: 模块运行并退出后,它将创建一个名为Thing.txt的文件,当我打开它时,那里有一些信息,其余的是一堆字符,而不是我想要的整齐的数据文件。有没有人知道如何使用cProfile并最终得到整齐有序的数据表,就像在命令行中正常使用时一样(除了在文件中)?这是.txt文件中某些数据的示例: 我真正想要的是,当您调用c
环境-MS Azure|源-Azure Blob容器(多个CSV文件保存在一个文件夹中)。|目标-AzureSQL数据库。 我的 Blob 容器每天会收到多个 CSV 文件。我想将所有这些 CSV 文件数据加载到 Azure SQL 数据库。如果源数据与目标匹配,则应更新它,否则应插入它。将数据加载到 Azure SQL 数据库中后,必须将 CSV 文件存档在 Blob 的不同目录中。我已经在 A
我有一个独立的H2服务器,正在收集数据。为了进行测试,我希望将数据从服务器中提取到CSV文件中。有什么工具吗?