我的Clojure代码大部分时间都在执行数据转换。 但是,我看不到要转换的数据的形状–我必须知道数据在输入中的外观,并掌握它们在每一步如何变化的思维模型。 但是我会犯错。 我在代码中犯了错误,因此数据不再与应遵循的模型相对应。 而且我在思维模型中对当前数据的外观犯了错误,以后可能会导致代码错误。 最终结果是相同的– 在以后的步骤中,关于数据形状错误的情况会有所帮助。 这里有两个问题:错误通常提供的有用信息太少,并且通常比实际的代码/模型错误出现的时间晚。 因此,我很容易花一个小时或更长时间来解决这些错误。 除此之外,也很难读取这样的代码,因为读者缺乏作者的数据思维模型,必须自己导出数据-这非常困难,特别是如果首先不清楚输入数据的形状的话。
我应该提到我当然在REPL中编写测试和实验,但是我仍然遇到这些问题,因此对我来说还不够。 测试无法保护我免受输入数据模型错误的影响(因为我基于与代码相同的假设编写了[单元]测试,并且仅在整合所有位时才发现错误),即使它们有助于发现一个错误。错误,这仍然是耗时的根本原因。
我可以做得更好吗? 我相信我可以。
很难解决具有延迟表现的错误,并且难以理解的代码只能传达故事的一半(转换,而不转换要转换的数据的形状),这是我们为动态键入功能付出的代价。 但是有降低价格的策略。 我想介绍其中的三个:具有名声的小型,集中功能,作为文档的分解以及明智地使用前置条件和后置条件。
这篇文章的内容是基于我在UlisesCerviñoBeresi慷慨提供的免费Clojure办公时间之一(与美国Leif类似)中从中学到的。
因此,我们需要使数据的形状更加明显并且快速失败,最好提供有用的错误消息。
主要思想是:
- 将转换分解为名称清晰的小型简单函数
- 在函数参数中使用解构来记录期望的数据
- 同时使用前置条件和后置条件(和/或断言)作为检查和文档
- (您已经在REPL中进行的所有测试和交互式探索。)
简化的例子
我们有一个网上商店,出售折扣车。 我们也偶尔会推出某些促销活动,以提高精选汽车的折扣率。 对于每辆汽车,我们还可以使用许多关键字来查找它及其所属的类别。 以下是处理原始汽车+广告活动+来自数据库查询的搜索关键字数据的代码,首先是原始查询,然后是经过检查的重构查询:
(require '[cheshire.core :refer [parse-string]])
(defn- json->data [key m]
(update-in m [key] #(parse-string % true)))
(defn- select-campaign [car+campaigns]
(first car+campaigns))
(defn- jdbc-array-to-set
[key m]
(update-in m [key] #(apply hash-set (.getArray %))))
(defn refine-cars-simple
"Process get-cars query result set - derive additional data, transform values into better ones
There is one row per car and campaign, a car may have more campaigns - we pick the best one.
"
[cars-raw]
(->>
cars-raw
(map (partial json->data :json)) ;; [[car1 ..][car2 ..]]
(group-by :id)
(vals)
;; join all car1 vectors into one car ..
(map select-campaign)
(map (fn [car]
(->>
car
(jdbc-array-to-set :category_ref)
(jdbc-array-to-set :keywords))))))
(require '[cheshire.core :refer [parse-string]]) (require ']) (defn- car? [{:keys [id] :as car}] (and (map? car) id)) (defn- json->data [key m] {:pre [(contains? m key) (string? (get m key))], :post [(map? (get % key))]} (update-in m [key] parse-string true)) (defn- select-campaign [[first-car :as all]] {:pre [(sequential? all) (car? first-car)], :post [(car? %)]} first-car) (defn- jdbc-array-to-set [key m] {:pre [(contains? m key) (instance? java.sql.Array (get m key))], :post [(set? (get % key))]} (update-in m [key] #(apply hash-set (.getArray %)))) (defn group-rows-by-car [cars-raw] {:pre [(sequential? cars-raw) (car? (first cars-raw))] :post [(sequential? %) (vector? (first %))]} (vals (group-by :id cars-raw))) (defn refine-car [car] {:pre [(car? car) (:keywords car) (:category_ref car)]} (->> car (jdbc-array-to-set :category_ref) (jdbc-array-to-set :keywords))) (defn refine-cars-simple "Process get-cars query result set - derive additional data, transform values into better ones There is one row per car and campaign, a car may have more campaigns - we pick the best one. " [cars-raw] (->> cars-raw (map (partial json->data :json)) ;;A real-world example
We have a webshop that sells discounted cars. Each car we sell has a base discount (either an absolute amount or percentage) and we also have occasional campaigns for selected cars. For each car we have also a number of keywords people can use to find it.
Original code
Below is code that processes raw car + campaigns + search keywords data from a DB query, selected the best applicable campaign and computing the final discount:
(require '[cheshire.core :refer [parse-string]]) (defn db-array [col] (reify java.sql.Array (getArray [_] (object-array col)))) (defn- json->data [data fields] (map (fn [data] (reduce (fn [data field] (assoc data field (parse-string (get data field) true))) data fields)) data)) (defn- discount-size [discount] (if (or (> (:amount discount) 10000) (> (:percent discount) 5)) :high :normal)) (defn- jdbc-array-to-set "Convert a PostgreSQL JDBC4Array inside the map `m` - at the key `k` - into a se" [key m] (update-in m [key] #(some->> % (.getArray) (into #{})))) (defn- compute-discount "Derive the :discount map based on the car's own discount and its active campaign, if applicable" [car] (let [{:keys [json campaign_discount_amount campaign_discount_percent]} car {:keys [discount_amount discount_percent]} json discount? (:use_campaign car) amount (if discount? (apply + (remove nil? [discount_amount campaign_discount_amount])) discount_amount) percent (if discount? (apply + (remove nil? [discount_percent campaign_discount_percent])) discount_percent) discount {:amount amount :percent percent} discount-size (discount-size discount) ] (assoc car :discount discount :discount-size discount-size))) (defn merge-campaigns "Given vector of car+campaign for a particular car, return a single car map with a selected campaign. " [car+campaigns] {:pre [(sequential? car+campaigns) :id]} (let [campaign-ks [:use_campaign :campaign_discount_amount :campaign_discount_percent :active] car (apply dissoc (first car+campaigns) campaign-ks) campaign (->> car+campaigns (map #(select-keys % campaign-ks)) (filter :active) (sort-by :use_campaign) ;; true, if any, will be last last)] (assoc car :best-campaign campaign))) (defn refine-cars "Process get-cars query result set - derive additional data, transform values into better ones There is one row per car and campaign, a car may have more campaigns - we pick the best one. " [cars-raw] (->> cars-raw (#(json->data % [:json])) (#(map merge-campaigns (vals (group-by :id %)))) (map (comp ;; reminder - the 1st fn is executed last compute-discount (fn [m] (update-in m [:keywords] (partial remove nil?))) ;; {NULL} => [] (partial jdbc-array-to-set :keywords) (partial jdbc-array-to-set :category_ref) )) )) (refine-cars [ {:id 1 :json "{\"discount_amount\":9000,\"discount_percent\":0}" :campaign_discount_amount 2000 :campaign_discount_percent nil :use_campaign false :active true :keywords (db-array ["fast"]) :category_ref (db-array [])}])
Defects and me
I had originally two [discovered] errors in the code and both took me quite a while to fix – first I forgot to convert JSON from string into a map (wrong assumption about input data) and then I run merge-campaigns directly on the list of car+campaign lists instead of mapping it (the sequential? precondition did not help to detect this error). So the transformations are clearly too error-prone.
The stack traces did not contain enough helpful context info (though a more experienced Clojurist would have certainly found and fixed the root causes much faster):
## Forgotten ->json: java.lang.NullPointerException: clojure.lang.Numbers.ops Numbers.java: 961 clojure.lang.Numbers.gt Numbers.java: 227 clojure.lang.Numbers.gt Numbers.java: 3787 core/discount-size cars.clj: 13 core/compute-discount cars.clj: 36 ------------- ## Forgotten (map ..): java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to clojure.lang.IPersistentMap RT.java:758 clojure.lang.RT.dissoc core.clj:1434 clojure.core/dissoc core.clj:1436 clojure.core/dissoc RestFn.java:142 clojure.lang.RestFn.applyTo core.clj:626 clojure.core/apply cars.clj:36 merge-campaigns ...
Refactored
This is the code refactored into smaller functions with checks (and it certainly can be improved much more):
(require '[cheshire.core :refer [parse-string]]) (require ']) (defn db-array [col] (reify java.sql.Array (getArray [_] (object-array col)))) (defn- json->data [data fields] {:pre [(sequential? data) (sequential? fields)]} (map (fn [data] (reduce (fn to-json [data field] {:pre [(map? data) (string? (get data field)) (keyword? field)]} (assoc data field (parse-string (get data field) true))) data fields)) data)) (defn- discount-size [{:keys [amount percent] :as discount}] {:pre [(number? amount) (number? percent) (<= 0 amount) (<= 0 percent 100)] :post [(#{:high :normal} %)]} (if (or (> amount 10000) (> percent 5)) :high :normal)) (defn- jdbc-array-to-set "Convert a PostgreSQL JDBC4Array inside the map `m` - at the key `k` - into a se" [key m] {:pre [(keyword? key) (map? m) (let [a (key m)] (or (nil? a) (instance? java.sql.Array a)))]} (update-in m [key] #(some->> % (.getArray) (into #{})))) (defn car? [{:keys [id] :as car}] (and (map? car) id)) (defn- compute-discount "Derive the :discount map based on the car's own discount and its active campaign, if applicable" [{{:keys [discount_amount discount_percent] :as json} :json :keys [campaign_discount_amount campaign_discount_percent] :as car}] {:pre [(car? car) (map? json) (number? discount_amount) (number? discount_percent)] :post [(:discount %) (:discount-size %)]} (let [discount? (:use_campaign car) amount (if discount? (apply + (remove nil? [discount_amount campaign_discount_amount])) discount_amount) percent (if discount? (apply + (remove nil? [discount_percent campaign_discount_percent])) discount_percent) discount {:amount amount :percent percent} discount-size (discount-size discount) ] (assoc car :discount discount :discount-size discount-size))) (defn select-campaign "Return a single car map with a selected campaign." [{:keys [campaigns] :as car}] {:pre [(car? car) (sequential? campaigns)] :post [(contains? % :best-campaign)]} (let [best-campaign (->> campaigns (filter :active) (sort-by :use_campaign) ;; true, if any, will be last last)] (-> car (dissoc :campaigns) (assoc :best-campaign best-campaign)))) (defn nest-campaign [car] ;; :pre check for campaing keys would require too much repetition => an assert instead {:pre [(car? car)] :post [((comp map? :campaign) %)]} (let [ks (set (keys car)) campaign-ks #{:campaign_discount_amount :campaign_discount_percent :use_campaign :active} campaign (select-keys car campaign-ks)] (assert (subset? campaign-ks ks) (str "Campaign keys missing from the car " (:id car) ": " (difference campaign-ks ks))) (-> (apply dissoc car campaign-ks) (assoc :campaign campaign)))) (defn group-rows-by-car [cars-raw] {:pre [(sequential? cars-raw) (every? map? cars-raw)] :post [(sequential? %) (every? vector? %)]} (vals (group-by :id cars-raw))) (defn join-campaigns [[car+campaign :as all]] {:pre [(sequential? all) (:campaign car+campaign)] :post [(:campaigns %)]} (-> car+campaign (assoc :campaigns (map :campaign all)) (dissoc :campaign))) (defn refine-car [car] {:pre [(car? car)] :post [(:discount %)]} ; keywords and :category_ref are optional (->> car (jdbc-array-to-set :category_ref) (jdbc-array-to-set :keywords) (#(update-in % [:keywords] (partial remove nil?))) ;; {NULL} => [] (select-campaign) (compute-discount))) (defn refine-cars "Process get-cars query result set - derive additional data, transform values into better ones There is one row per car and campaign, a car may have more campaigns - we pick the best one. " [cars-raw] (->> cars-raw (#(json->data % [:json])) (map nest-campaign) (group-rows-by-car) (map join-campaigns) (map refine-car) )) (refine-cars [ {:id 1 :json "{\"discount_amount\":9000,\"discount_percent\":0}" :campaign_discount_amount 2000 :campaign_discount_percent nil :use_campaign false :active true :keywords (db-array ["fast"]) :category_ref (db-array [])}])
Downsides
The main problem with pre- and post-conditions is that they do not provide any useful context in their error message and do not support adding a custom message. An error like
Assert failed: (let [a (key m)] (or (nil? a) (instance? java.sql.Array a))) cars.clj:18 user/jdbc-array-to-set
is better than not failing fast but does not tell as what the invalid value was and which of the thousands of cars had the invalid value.
Also, the checks are performed at runtime so they have a performance cost. This might not be a problem with checks such as (map?) but could be with f.ex. (every?).
What about duplication?
Do you repeat the same checks again and again? Then you could either copy them using
with-meta
(they end-up in metadata anyway) or reuse the explicitely:(defn with-valid-car [f] (fn [car] {:pre [:make :model :year]} (f car))) (def count-price (with-valid-car (fn [car] (do-something car)))) ;; or make & use a macro to make it nicer
What about static types
This looks arguably like a good case for static types. And yes, I come from Java and lack them. On the other hand, even though static typing would solve the main category of problems, it creates new ones and has its liits.
A) I have actually quite a number of “types” here so it would require lot of classes to model fully:
- Raw data from the DB – car with campaign fields and keywords, category_ref as java.sql.Array
- Car with keywords as a sequence
- Car with category_ref as a sequence
- Car with a nested :campaign “object”
- Car with a nested :best-campaign object and with :rate (you could have :rate there from start, set initially to nil, but then you’d still need to ensure that the final function sets it to a value)
B) A key strength of Clojure is the use of generic data structures – maps, vectors, lazy sequences – and powerful, easy to combine generic functions operating on them. It makes integrating libraries very easy since everything is just a map (and not a custom type that needs to be converted) and you can always transform these with your old good friends functions – whether it is a Korma SQL query definition, result set, or a HTTP request. Static types take this away.
C) Types permit only a subset of checks that you might need (that is unless you use Haskell!) – they can check that a thing is a car but not that a return value is in the range 7 … 42.
D) Some functions do not care about the type, only its small part – f.ex. jdbc-array-to-set only cares about the argument being a map, having the key, and if set, the value being a java.sql.Array .
What else is out there?
- 核心类型
- 棱柱形/模式
- 与core.contracts进行合同编程
结论
使用更小的函数和前后条件,我可以更早地发现错误,并且可以更好地记录数据的预期形状,甚至可以通过fn签名的破坏来实现。 前后条件中有一些重复,并且错误消息几乎没有帮助,但更好。 我猜想更复杂的情况可能会警告使用core.contracts甚至core.typed / schema。
您使用什么策略? 你会改善什么? 其他的建议?
我鼓励您分叉并改进要点,并分享自己的看法。
更新
- 劳伦斯Krubner建议使用可怕的拍摄参数和返回值,以提供一个有用的错误消息
- Alf Kristian建议添加更多测试和集成测试,如果还不够,请使用core.typed而不是:pre和:post( 示例 )