MIT6.005 Problem Set 1 Tweet Tweet

空浩淼

2023-12-01

测试优先编程

认真地研究方法规范
根据规范为方法写单元测试
根据规范实现方法
修改实现和测试用例直到通过所有测试

要求

测试用例的选择应该根据输入、输出空间分区
在每个测试方法上写测试的策略
测试用例要少而精
测试用例应该严格地符合规范，对所有正确的实现都有一致的测试结果。
一个测试方法中放一个测试用例

Problem 1: Extracting data from tweets

要求

为getTimespan()和getMentionedUser()设计、记录、实现测试用例
实现这两个方法并通过测试

实现

getTimespan

该方法得到输入的推文发送的时间段。

测试用例

该方法有如下形式：

getTimespan: List -> Timespan

仅有一个输入参数List，因此只需根据其长度分类即可：

/* 
 * getTimeSpan(tweets) test strategy:
 * 
 *      Partition the input as follow:
 *      tweets.size():  0, 1, >1
 * 
 *      Full Cartesian product 
 */

方法实现

一次遍历，找到最大和最小的两个时间就行了。

public static Timespan getTimespan(List<Tweet> tweets) {
    if (tweets.size() == 0) {
        throw new IllegalArgumentException
            ("Exception message: tweets size == 0!");
    }
    Instant start = tweets.get(0).getTimestamp(), 
    end = tweets.get(0).getTimestamp();
    for(Tweet tweet : tweets) {
        Instant current = tweet.getTimestamp();
        if (current.isBefore(start)) start = current;
        if (current.isAfter(end)) end = current;
    }
    return new Timespan(start, end);
}

getMentionedUsers

该方法得到推文中提及的名称的集合。提及的名称跟在@后面，并且在@符号前以及提及的名称后不能有可以组成名称的字符，否则该名称不是一个可以加入结果集的名称。

getMentionedUsers: List<Tweet> -> Set<String>

测试用例

需要根据List的长度、推文中的名称数量以及名称重复考虑。

/*     
 * getMentionedUser(tweets) test strategy:
 * 
 *      Partition the input as follow:
 *      tweets.size(): 0, 1, >1
 *      mentioned name number: 0, 1, > 1
 *      tweet.name: exist names with different case or not exist
 *         
 *      Part coverage
 */

方法实现

我们需要提取符合格式的名称，这里我使用了正则表达式。除此之外，名称在结果集合中都转为小写，这样便于检查是否有重复名称，同时也不致违反该方法的规范。

先实现一个辅助函数getMentionUserInOneTweet从一条推文中获得名字集合，然后将这些集合合并即可。

注意：我的实现中，开头或结尾的@xxx会无法加入集合，但这也不与规范冲突，因此没有再多想。

在成员变量中添加：

private static final String nameLetters = "A-Za-z0-9_-";
private static final Pattern pattern = Pattern.compile
    ("[^"+ nameLetters + "]@[" + nameLetters + "]+[^" + nameLetters + "]");

实现如下：

public static Set<String> getMentionedUsers(List<Tweet> tweets) {
    Set<String> set = new HashSet<>();
    for(Tweet tweet: tweets) 
        set.addAll(getMentionedUsersInOneTweet(tweet));
    return set;
}
public static Set<String> getMentionedUsersInOneTweet(Tweet tweet) {
    Set<String> set = new HashSet<>();
    Matcher matcher = pattern.matcher(tweet.getText());
    while(matcher.find()) {
        String name = matcher.group();
        // remove the first two and last chars
        // then convert to lower case                
        name = name.substring(2, name.length() - 1).toLowerCase();

        set.add(name);
    }
    return set;
}

Problem 2: Filtering lists of tweets

要求

实现函数：

writtenBy()
inTimespan()
containing()

以及其测试用例，并通过测试。

writtenBy

找出推文集中特定作者的推文。

writtenBy: List<Tweet> * String -> List<Tweet>

测试用例

有两个参数需要考虑。

/*
 * writtenBy test strategy:
 *      partition input into:
 *      list size: 0, >0
 *      name length: 0, >0
 *      name contains illegal char or not
 *      
 *      Part Coverage 
 */

方法实现

含非法字符或长度为0的姓名为不合格的输入。

public static List<Tweet> writtenBy(List<Tweet> tweets, String username) {
    Pattern pattern = Pattern.compile("[^A-Za-z0-9-_]");
    Matcher matcher = pattern.matcher(username);
    if (matcher.find() || username.length() == 0) {
        throw new IllegalArgumentException();
    }
    List<Tweet> list = new LinkedList();
    for(Tweet tweet: tweets) {
        if (tweet.getAuthor().equals(username)) {
            list.add(tweet);
        }
    }
    return list;
}

inTimespan

给定一个推文集和一个时间段，找出在该时间段内发出的推文，返回的推文的相对顺序与输入中相同。

inTimespan: List<Tweet> * Timespan -> List<Tweet>

测试用例

有两个参数，策略设计如下：

/*     
 * inTimespan test strategy:
 *      partition input into:
 *      list size: 0, >0
 *      timespan: start==end or not
 *
 *      Full Cartesian product
 */

方法实现

在实现中，时间段的区间为[start, end)。

public static List<Tweet> inTimespan(List<Tweet> tweets, Timespan timespan) {
    List<Tweet> list = new LinkedList<>();
    Instant start = timespan.getStart();
    Instant end = timespan.getEnd();
    for(Tweet tweet: tweets) {
        Instant current = tweet.getTimestamp();
        if ((current.isAfter(start) && current.isBefore(end)) 
            || 
            current.equals(start)) {
            list.add(tweet);
        }
    }
    return list;
}

containing

找出含有特定单词的推文，大小写不敏感。

containing: List<Tweet> * List<String> -> List<Tweet>

测试用例

根据推文集、单词集长度，是否含有空格分类、单词大小写：

/*     
 * containing test strategy:
 *      partition input into:
 *      tweets list size: 0, >0
 *      words list size: 0, >0
 *      words list contain illegal word or not
 *      words list contain word with different case from the words of tweet text
 */

方法实现

public static List<Tweet> containing(List<Tweet> tweets, List<String> words) {
    for(String word : words) {
        if (word.contains(" ")) {
            throw new IllegalArgumentException("illegal space in " + word);
        }
    }
    List<Tweet> list = new LinkedList<>();
    for (Tweet tweet: tweets) {
        String text = tweet.getText().toLowerCase();
        for (String word: words) {
            String lower = word.toLowerCase();
            if (text.contains(lower)) {
                list.add(tweet);
                break;
            }
        }
    }
    return list;
}

Problem 3: Inferring a social network

要求

实现

guessFollowsGraph()
influencers()

guessFollowsGraph

建立含<follower, authors>键值对的map，当然，建立反过来的<author, followers>的map也可以。

guessFollowsGraph: List<Tweet> -> Map<String, Set<String>>

测试用例

只需根据输入列表的长度划分：

/*
 * guessFollowsGraph test strategy:
 *      partition input into:
 *      input list size: 0, >0
 *      
 *      Full Cartesian product
 */

方法实现

public static Map<String, Set<String>> guessFollowsGraph(List<Tweet> tweets) {
    Map<String, Set<String>> map = new HashMap<>();
    for(Tweet tweet: tweets) {
        Set<String> mentionedUsers = Extract.getMentionedUsersInOneTweet(tweet);
        if (mentionedUsers.size() > 0) {
            String author = tweet.getAuthor();
            if (map.containsKey(author)) map.get(author).addAll(mentionedUsers);
            else map.put(author, mentionedUsers);
        }
    }
    return map;
}

influencers

找到粉丝最多的人，可能有多个。

influencers: Map<String, Set<String>> -> Set<String>

测试用例

/*
 * influencers test strategy:
 *      partition input into:
 *      input map size: 0, >0
 *      
 *      Full Cartesian product
 */

方法实现

由于之前实现的guessFollowsGraph()返回的是含有<follower, authors>的map，而在这里我们需要的是含<author, followers>的map，因此需要先反转一下，然后再将粉丝最多的人加入列表即可。

public static List<String> influencers(Map<String, Set<String>> followsGraph) {
    Map<String, Integer> reverseMap = new HashMap<>();
    int maxFollowers = 0;
    // reverse
    for (Map.Entry<String, Set<String>> entry: followsGraph.entrySet()) {
        Set<String> authors = entry.getValue();
        for (String s: authors) {
            reverseMap.put(s, 1 + (reverseMap.containsKey(s) ? reverseMap.get(s) : 0));
            maxFollowers = Math.max(maxFollowers, reverseMap.get(s));
        }
    }
    // add to list
    List<String> list = new LinkedList<>();
    for (Map.Entry<String, Integer> entry: reverseMap.entrySet()) {
        if (entry.getValue() == maxFollowers) {
            list.add(entry.getKey());
        }
    }
    return list;
}

MIT6.005 Problem Set 1 Tweet Tweet

测试优先编程

要求

Problem 1: Extracting data from tweets

要求

实现

getTimespan

测试用例

方法实现

getMentionedUsers

测试用例

方法实现

Problem 2: Filtering lists of tweets

要求

writtenBy

测试用例

方法实现

inTimespan

测试用例

方法实现

containing

测试用例

方法实现

Problem 3: Inferring a social network

要求

guessFollowsGraph

测试用例

方法实现

influencers

测试用例

方法实现

相关阅读

相关文章

相关问答