当前位置: 首页 > 面试题库 >

需要通过PHP将大型CSV文件导入多个MySQL表的省时方法

鲁明知
2023-03-14
问题内容

好的,我这里有一些严重的问题。我是这个网站的新手,也是通过PHP导入CSV数据的新手,但是我对编程并不陌生。

目前,我正在构建客户关系经理。我需要创建一个脚本来导入一个文件,该文件将用线索填充数据库。这里的主要问题是潜在客户数据由公司和该公司的员工组成。此外,还会从主表中分离出其他一些表,例如帐单信息。

我有一个工作脚本,该脚本将允许用户将导入的数据映射到特定的行和列。

function mapData($file) {
    // Open the Text File
    $fd = fopen($file, "r");

    // Return FALSE if file not found
    if(!$fd) {
        return FALSE;
    }

    // Get the First Two Lines
    $first = 0;
    $data = array();
    while(!feof($fd)) {
        if($first == 0) {
            $cols = fgetcsv($fd, 4096);
            $data['cols'] = array();
            if(is_array($cols) && count($cols)) {
                foreach($cols as $col) {
                    if(!$col) {
                        continue;
                    }
                    $data['cols'][] = $col;
                }
            }
            if(empty($data['cols'])) {
                return array();
            }
            $first++;
            continue;
        }
        else {
            $data['first'] = fgetcsv($fd, 4096);
            break;
        }
    }
    fclose($fd);

    // Return Data
    return $data;
}

仅在CodeIgniter将文件移动到工作目录后,才激活以上脚本。至此,我已经知道文件名是什么了。该文件进入并返回列和第一行的列表。任何空列都将被忽略。

此后,过程转到映射脚本。一旦完成映射并按下“导入”,就会加载这段代码。

function importLeads($file, $map) {
    // Open the Text File
    if(!file_exists($file)) {
        return false;
    }
    error_reporting(E_ALL);
    set_time_limit(240);
    ini_set("memory_limit", "512M");
    $fd = fopen($file, "r");

    // Return FALSE if file not found
    if(!$fd) {
        return FALSE;
    }

    // Traverse Each Line of the File
    $true = false;
    $first = 0;
    while(!feof($fd)) {
        if($first == 0) {
            $cols = fgetcsv($fd);
            $first++;
            continue;
        }

        // Get the columns of each line
        $row = fgetcsv($fd);

        // Traverse columns
        $group = array();
        $lead_status = array();
        $lead_type = array();
        $lead_source = array();
        $user = array();
        $user_cstm = array();
        $user_prof = array();
        $acct = array();
        $acct_cstm = array();
        $acct_prof = array();
        $acct_group = array();
        if(!$row) {
            continue;
        }
        foreach($row as $num => $val) {
            if(empty($map[$num])) {
                continue;
            }
            $val = str_replace('"', """, $val);
            $val = str_replace("'", "'", $val);
            switch($map[$num]) {
            // Company Account
            case "company_name":
                $acct['company_name'] = $val;
                break;
            case "lead_type":
                $lead_type['name'] = $val;
                break;
            case "lead_source":
                $lead_source['name'] = $val;
                break;
            case "lead_source_description":
                $lead_source['name'] = $val;
                break;
            case "campaign":
                $campaign['name'] = $val;
                break;
            case "mcn":
                $acct['mcn'] = $val;
                break;
            case "usdot":
                $acct['usdot'] = $val;
                break;
            case "sic_codes":
                $acct_cstm['sic_codes'] = $val;
                break;
            case "naics_codes":
                $acct_cstm['naics_codes'] = $val;
                break;
            case "agent_assigned":
                $acct_cstm['agent_assigned'] = $val;
                break;
            case "group_assigned":
                $group['name'] = $val;
                break;
            case "rating":
                $acct_cstm['rating'] = $val;
                break;
            case "main_phone":
                $acct['phone'] = $val;
                break;
            case "billing_phone":
                $acct_cstm['billing_phone'] = $val;
                break;
            case "company_fax":
                $acct['fax'] = $val;
                break;
            case "company_email":
                $acct['email2'] = $val;
                break;

            // Company Location
            case "primary_address":
                $acct['address'] = $val;
                break;
            case "primary_address2":
                $acct['address2'] = $val;
                break;
            case "primary_city":
                $acct['city'] = $val;
                break;
            case "primary_state":
                $acct['state'] = $val;
                break;
            case "primary_zip":
                $acct['zip'] = $val;
                break;
            case "primary_country":
                $acct['country'] = $val;
                break;
            case "billing_address":
                $billing['address'] = $val;
                break;
            case "billing_address2":
                $billing['address2'] = $val;
                break;
            case "billing_city":
                $billing['city'] = $val;
                break;
            case "billing_state":
                $billing['state'] = $val;
                break;
            case "billing_zip":
                $billing['zip'] = $val;
                break;
            case "billing_country":
                $billing['country'] = $val;
                break;
            case "company_website":
                $acct_cstm['website'] = $val;
                break;
            case "company_revenue":
                $acct_cstm['revenue'] = $val;
                break;
            case "company_about":
                $acct_prof['aboutus'] = $val;
                break;

            // Misc. Company Data
            case "bols_per_mo":
                $acct_cstm['approx_bols_per_mo'] = $val;
                break;
            case "no_employees":
                $acct_cstm['no_employees'] = $val;
                break;
            case "no_drivers":
                $acct_prof['drivers'] = $val;
                break;
            case "no_trucks":
                $acct_prof['power_units'] = $val;
                break;
            case "no_trailers":
                $acct_cstm['no_trailers'] = $acct_prof['trailers'] = $val;
                break;
            case "no_parcels_day":
                $acct_cstm['no_parcels_day'] = $val;
                break;
            case "no_shipping_locations":
                $acct_cstm['no_shipping_locations'] = $val;
                break;
            case "approves_inbound":
                $acct_cstm['approves_inbound'] = $val;
                break;
            case "what_erp_used":
                $acct_cstm['what_erp_used'] = $val;
                break;
            case "birddog":
                $acct_cstm['birddog_referral'] = $val;
                break;
            case "status_notes":
                $acct_cstm['status_notes'] = $val;
                break;
            case "notes":
                $acct_cstm['notes'] = $val;
                break;
            case "internal_notes":
                $acct_cstm['notes_internal'] = $val;
                break;

            // User Data
            case "salutation":
                $user_cstm['salutation'] = $val;
                break;
            case "first_name":
                $user['first_name'] = $billing['first_name'] = $val;
                break;
            case "last_name":
                $user['last_name'] = $billing['last_name'] = $val;
                break;
            case "user_title":
                $user_prof['title'] = $val;
                break;
            case "user_about":
                $user_prof['about'] = $val;
                break;
            case "user_email":
                $user['email'] = $val;
                break;
            case "home_phone":
                $user_prof['phone'] = $val;
                break;
            case "mobile_phone":
                $user_cstm['mobile_phone'] = $val;
                break;
            case "direct_phone":
                $user_cstm['direct_phone'] = $val;
                break;
            case "user_fax":
                $user_prof['fax'] = $val;
                break;
            case "user_locale":
                $user['location'] = $val;
                break;
            case "user_website":
                $user_prof['website_url'] = $val;
                break;
            case "user_facebook":
                $user_prof['fb_url'] = $val;
                break;
            case "user_twitter":
                $user_prof['twitter_url'] = $val;
                break;
            case "user_linkedin":
                $user_prof['linkedin_url'] = $val;
                break;
            }
        }
        if(empty($acct['company_name']) || empty($user['first_name']) || empty($user['last_name'])) {
            continue;
        }
        $this->db = $this->load->database('crm_db', TRUE);
        if(isset($lead_type['name']) && ($name = $lead_type['name'])) {
            $count = $this->db->count_all("lead_types");
            $check = $this->db->get_where("lead_types", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("lead_types", array("name" => $name, "order" => $count));
                $ltype = $this->db->insert_id();
                $acct_cstm['lead_type'] = $acct['account_type'] = $user['company_type'] = $ltype;
            }
        }
        if(isset($lead_source['name']) && ($name = $lead_source['name'])) {
            $count = $this->db->count_all("lead_sources");
            $check = $this->db->get_where("lead_sources", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("lead_sources", array("name" => $name, "order" => $count));
                $acct_cstm['lead_source'] = $this->db->insert_id();
            }
        }
        if(isset($campaign['name']) && ($name = $campaign['name'])) {
            $check = $this->db->get_where("campaigns", array("name" => $name));
            if($check->num_rows() < 1) {
                $campaign['id'] = $accounts_cstm['campaign'] = $this->Secure_m->generate_sugar_id();
                $campaign['date_entered'] = time();
                $campaign['date_modified'] = time();
                $campaign['modified_user_id'] = $this->session->userdata('id');
                $campaign['created_by'] = $this->session->userdata('id');
                $this->db->insert("campaigns", $campaign);
            }
        }
        if(isset($group['name']) && ($name = $group['name'])) {
            $order = $this->db->count_all("groups");
            $check = $this->db->get_where("groups", array("name" => $name));
            if($check->num_rows() < 1) {
                $this->db->insert("groups", array("name" => $name, "order" => $order));
                $acct_group['id'] = $this->db->insert_id();
            }
        }
        $mem = new stdclass;
        $uid = 0;
        if(is_array($user) && count($user)) {
            $where = "";
            if(!empty($user['phone'])) {
                $where .= "prof.phone = '{$user['phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['phone']}'";
            }
            if(!empty($user['mobile_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "prof.phone = '{$user['mobile_phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['mobile_phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['mobile_phone']}'";
            }
            if(!empty($user['direct_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "prof.phone = '{$user['direct_phone']}' OR ";
                $where .= "cstm.mobile_phone = '{$user['direct_phone']}' OR ";
                $where .= "cstm.direct_phone = '{$user['direct_phone']}'";
            }
            $query = $this->db->query($this->Account_m->userQuery($where));
            $mem = reset($query->result());
            if($where && !empty($mem->id)) {
                $uid = $mem->id;
                $new = array();
                foreach($user as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("leads", $user, array("id" => $uid));
                $user = $new;
            }
            else {
                $user['uxtime'] = time();
                $user['isclient'] = 0;
                $user['flag'] = 0;
                $user['activation_code'] = $this->Secure_m->generate_activate_id();
                $uid = $this->Secure_m->generate_activate_id(10);
                $query = $this->db->get_where("leads", array("id" => $uid), 1);
                $data = reset($query->result());
                while(!empty($data->id)) {
                    $uid = $this->Secure_m->generate_activate_id(10);
                    $query = $this->db->get_where("leads", array("id" => $uid), 1);
                    $data = reset($query->result());
                }
                $user['id'] = $uid;
                $this->db->insert("leads", $user);
            }
        }
        if($uid && is_array($user_prof) && count($user_prof)) {
            if(!empty($mem->uid)) {
                $new = array();
                foreach($user_prof as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("mprofiles", $user_prof, array("uid" => $uid));
                $user_prof = $new;
            }
            else {
                $user_prof['uid'] = $uid;
                $user_prof['flag'] = 0;
                $this->db->insert("ldetails", $user_prof);
            }
        }
        if($uid && is_array($user_cstm) && count($user_cstm)) {
            $query = $this->db->get_where("leads_cstm", array("crm_id" => $cid), 1);
            $data = reset($query->result());
            if(!empty($data->crm_id)) {
                $new = array();
                foreach($user_cstm as $k => $v) {
                    if(!empty($mem->$k)) {
                        $new[$k] = $mem->$k;
                        unset($user_cstm[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("leads_cstm", $acct_prof, array("fa_user_id" => $cid));
                $user_cstm = $new;
            }
            else {
                $user_cstm['crm_id'] = $uid;
                $user_cstm['date_entered'] = time();
                $user_cstm['date_modified'] = time();
                $user_cstm['created_by'] = $this->session->userdata('id');
                $user_cstm['modified_user_id'] = $this->session->userdata('id');
                $this->db->insert("leads_cstm", $user_cstm);
            }
        }
        $cmp = new stdclass;
        $cid = 0;
        if(is_array($acct) && count($acct)) {
            $acct['uid'] = $uid;
            $acct['main_contact'] = "{$user['first_name']} {$user['last_name']}";
            if(!empty($user['email'])) {
                $acct['email'] = $user['email'];
            }
            $acct['isprospect'] = 0;
            $acct['flag'] = 0;
            if(!empty($acct['mcn'])) {
                $where .= "fms.mcn = '{$acct['mcn']}'";
            }
            if(!empty($acct['phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.phone = '{$acct['phone']}' OR ";
                $where .= "crm.billing_phone = '{$acct['phone']}'";
            }
            if(!empty($acct['billing_phone'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.phone = '{$acct['billing_phone']}' OR ";
                $where .= "crm.billing_phone = '{$acct['billing_phone']}'";
            }
            if(!empty($acct['company_name'])) {
                if($where) {
                    $where .= " OR ";
                }
                $where .= "fms.company_name = '{$acct['company_name']}'";
            }
            $query = $this->db->query($this->Account_m->acctQuery($where));
            $cmp = reset($query->result());
            if($where && !empty($cmp->id)) {
                $cid = $cmp->id;
                $new = array();
                foreach($acct as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts", $billing, array("cid" => $cid));
                $acct = $new;
            }
            else {
                $cid = $this->Secure_m->generate_activate_id(10);
                $query = $this->db->get_where("leads", array("id" => $uid), 1);
                $data = reset($query->result());
                while(!empty($data->id)) {
                    $cid = $this->Secure_m->generate_activate_id(10);
                    $query = $this->db->get_where("accounts", array("id" => $cid), 1);
                    $data = reset($query->result());
                }
                $acct['id'] = $cid;
                $this->db->insert("accounts", $acct);
            }
        }
        if($cid && is_array($acct_group) && count($acct_group)) {
            $grp = $this->db->get_where("accounts_groups", array("cid" => $cid, "gid" => $acct_group['id']));
            if(empty($cmp->id)) {
                $acct_group['cid'] = $cid;
                $this->db->insert("accounts_groups", $acct_group);
            }
        }
        if($cid && is_array($acct_prof) && count($acct_prof)) {
            if(!empty($cmp->id)) {
                $new = array();
                foreach($acct_prof as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("cprofiles", $acct_prof, array("cid" => $cid));
                $acct_prof = $new;
            }
            else {
                $acct_prof['cid'] = $cid;
                $acct_prof['flag'] = 0;
                $this->db->insert("adetails", $acct_prof);
            }
        }
        if($cid && is_array($billing) && count($billing)) {
            $bill = $this->db->get_where("accounts_billing", array("cid" => $cid));
            if(!empty($bill->id)) {
                $new = array();
                foreach($acct_prof as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_prof[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts_billing", $billing, array("cid" => $cid));
            }
            else {
                $billing['cid'] = $cid;
                $billing['flag'] = 0;
                $this->db->insert("accounts_billing", $billing);
            }
        }
        if($cid && $uid) {
            $this->db->update("leads", array("cid" => $cid), array("id" => $uid));
        }
        if($cid && is_array($acct_cstm) && count($acct_cstm)) {
            $query = $this->db->get_where("accounts_cstm", array("crm_id" => $cid), 1);
            $data = reset($query->result());
            if(!empty($data->crm_id)) {
                $new = array();
                foreach($acct_cstm as $k => $v) {
                    if(!empty($cmp->$k)) {
                        $new[$k] = $cmp->$k;
                        unset($acct_cstm[$k]);
                    }
                    else {
                        $new[$k] = $v;
                    }
                }
                //$this->db->update("accounts_cstm", $acct_cstm, array("crm_id" => $cid));
                $acct_cstm = $new;
            }
            else {
                $acct_cstm['crm_id'] = $cid;
                $acct_cstm['date_entered'] = time();
                $acct_cstm['date_modified'] = time();
                $acct_cstm['created_by'] = $this->session->userdata('id');
                $acct_cstm['modified_user_id'] = $this->session->userdata('id');
                if(empty($acct_cstm['rating'])) {
                    $acct_cstm['rating'] = 1;
                }
                $this->db->insert("accounts_cstm", $acct_cstm);
            }
        }
        $true = TRUE;
    }
    fclose($fd);

    return $true;
}

现在,据我所知,该脚本运行正常。实际代码本身没有错。问题是在大约400-500行之后,脚本才停止。我没有收到错误,但是没有进一步的代码被处理。

我知道这一点,因为在此之后我有应该通过AJAX返回重定向页面的代码。但是,在importLeads函数中,循环之后什么都不会加载。

我不确定如何提高此脚本的效率…我很肯定它正在超时,但是我不知道如何使其更高效地运行。我需要此脚本来单独处理以上所有信息。我有各种单独的表,它们都链接在一起,并且此导入脚本必须以不同的方式设置所有内容。

我已经与我的客户讨论了这个项目。当我将其放到约400行时,此脚本有效。他有很多这些CSV文件,大约有75,000行。我要导入的是较小的,只有大约1200行。

我已经尝试寻找替代方法,例如MySQL的import脚本,但是我不能这样做,因为该脚本必须将数据导入到单独的表中,并且必须首先检查现有数据。我还应该使用导入的信息来更新所有空字段,但这会使情况变得更糟。

如果有人知道更有效的方法,将不胜感激。我试图尽可能详细。值得注意的是,我会提到我正在使用CodeIgniter,但是如果有一种更有效的方法不使用CodeIgniter,我会采用(尽管我仍然可以将其放入CI模型中)。


问题答案:

我已经编写了PHP脚本来批量加载Stack Overflow数据转储发布的数据。我导入了数百万行,并不需要那么长时间。

这里有一些提示:

  • Don’t rely on autocommit. The overhead of starting and committing a transaction for every row is enormous. Use explicit transactions, and commit after every 1000 rows (or more).

  • Use prepared statements. Since you are basically doing the same inserts thousands of times, you can prepare each insert before you start looping, and then execute during the loop, passing values as parameters. I don’t know how to do this with CodeIgniter’s database library, you’ll have to figure it out.

  • Tune MySQL for import. Increase cache buffers and so on. See Speed of INSERT Statements for more information.

  • Use LOAD DATA INFILE. If possible. It’s literally 20x faster than using INSERT to load data row by row. I understand if you can’t because you need to get the last insert id and so on. But in most cases, even if you read the CSV file, rearrange it and write it out to multiple temp CSV files, the data load is still faster than using INSERT.

  • Do it offline. Don’t run long-running tasks during a web request. The time limit of a PHP request will terminate the job, if not today then next Tuesday when the job is 10% longer. Instead, make the web request queue the job, and then return control to the user. You should run the data import as a server process, and periodically allow the user to glimpse the rate of progress. For instance, a cheap way to do this is for your import script to output “.” to a temp file, and then the user can request to view the temp file and keep reloading in their browser. If you want to get fancy, do something with Ajax.



 类似资料:
  • 问题内容: 更新 在我发布此问题之后的第二秒,由于对结果查询的语法突出显示,我看到了出了什么问题:该字符串未以闭合斜线开头。现在我将其更改为: 但是,这提出了一个新问题:为什么PDO对象没有为此向我吐出错误?手动执行查询肯定会返回一个错误,指出没有名为的字段,最后是逗号。为什么我没有收到任何错误?有任何想法吗? PS:关于解决我的问题的SO语法突出显示方面有什么想法吗?:-) 我将原始问题留作参考

  • 问题内容: 我想通过可手动运行以更新数据的PHP脚本将表格从CSV文件导入SQLite DB。 以下是我要实现的目标的列表: 将旧表(称为“ produkte”)重命名为product-currentdate(或删除表) 然后从CSV文件导入文件(分隔并使用ISO 8859-1字符集/ CSV文件的第一行包含表标题) 将日期保存在表“产品”中 我发现了一个由于某种原因无法运行的脚本: 我希望有人知

  • 本文向大家介绍php将csv文件导入到mysql数据库的方法,包括了php将csv文件导入到mysql数据库的方法的使用技巧和注意事项,需要的朋友参考一下 本文实例讲述了php将csv文件导入到mysql数据库的方法。分享给大家供大家参考。具体分析如下: 本程序实现数据导入原理是先把csv文件上传到服务器,然后再通过php的fopen与fgetcsv文件把数据保存到数组,然后再用while把数据一

  • 问题内容: 我有一个php脚本,用于解析XML文件并创建一个看起来像这样的大型SQL文件: 这个文件加起来超过20GB(我已经在2.5GB的文件上进行过测试,但它也失败了)。 我已经尝试过类似的命令: mysql -u root -p table_name </var/www/bigfile.sql 这适用于较小的文件,例如大约50MB。但不适用于较大的文件。 我试过了: 我也尝试了mysqlim

  • 问题内容: 将csv文件上传到mysql表的最佳/最快方法是什么?我想将第一行数据用作列名。 发现了这一点: 如何将CSV文件导入MySQL表 但是唯一的答案是使用GUI而不是Shell? 问题答案: 您无需编写脚本即可从CSV文件中提取信息,而是可以直接将MYSQL链接到该文件并使用以下SQL语法上传信息。 要将Excel文件导入MySQL,请先将其导出为CSV文件。从生成的CSV文件中删除CS

  • 问题内容: 我正在尝试将很大的.csv文件(〜4gb)导入mysql。我正在考虑使用phpmyadmin,但是您的最大上传大小为2mb。有人告诉我,我必须使用命令行。 我打算按照以下说明将其导入:http : //dev.mysql.com/doc/refman/5.0/en/mysqlimport.html#c5680 将.csv表中的第一行设置为mysql表中的列名的命令是什么?该选项可通过p