当前位置: 首页 > 知识库问答 >
问题:

获取字符串中特定位置的多个字符替换的所有组合

吕奇
2023-03-14

我有一个像下面这样的data.table,它包含一组DNA字符串(ref_seq列)和一系列突变(alt列中的字符)以及它们沿字符串(snp_position列)的相对位置。

因为对于n个突变,得到的字符串数是2^n,我想做的是生成一个通用函数,允许生成在ref\u-seq字符串中的alt字符的所有组合,同时(最好)保留该数据。框架结构

> dt
   seqnames hap_start  hap_end      snp alt snp_position numb_snps_per_hap
1:     chr1  19600274 19600443 19600324   G           50                 4
2:     chr1  19600274 19600443 19600378   C          104                 4
3:     chr1  19600274 19600443 19600389   A          115                 4
4:     chr1  19600274 19600443 19600396   C          122                 4
5:    chr10   5482730  5482899  5482790   C           60                 4
6:    chr10   5482730  5482899  5482830   A          100                 4
7:    chr10   5482730  5482899  5482839   A          109                 4
8:    chr10   5482730  5482899  5482843   A          113                 4
                                                                                                                                                                      ref_seq
1: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
2: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
3: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
4: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
5: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
6: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
7: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
8: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
> dput(dt)
structure(list(seqnames = c("chr1", "chr1", "chr1", "chr1", "chr10", 
"chr10", "chr10", "chr10"), hap_start = c(19600274, 19600274, 
19600274, 19600274, 5482730, 5482730, 5482730, 5482730), hap_end = c(19600443, 
19600443, 19600443, 19600443, 5482899, 5482899, 5482899, 5482899
), snp = c(19600324L, 19600378L, 19600389L, 19600396L, 5482790L, 
5482830L, 5482839L, 5482843L), alt = c("G", "C", "A", "C", "C", 
"A", "A", "A"), snp_position = c(50, 104, 115, 122, 60, 100, 
109, 113), numb_snps_per_hap = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L), ref_seq = c("CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC", 
"CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC", 
"CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC", 
"CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC", 
"TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT", 
"TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT", 
"TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT", 
"TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT"
)), row.names = c(NA, -8L), class = c("data.table", "data.frame"
), sorted = "seqnames", .internal.selfref = <pointer: 0x1d50f80>)

期望输出

> dt_final
    seqnames hap_start  hap_end
 1:     chr1  19600274 19600443
 2:     chr1  19600274 19600443
 3:     chr1  19600274 19600443
 4:     chr1  19600274 19600443
 5:     chr1  19600274 19600443
 6:     chr1  19600274 19600443
 7:     chr1  19600274 19600443
 8:     chr1  19600274 19600443
 9:     chr1  19600274 19600443
10:     chr1  19600274 19600443
11:     chr1  19600274 19600443
12:     chr1  19600274 19600443
13:     chr1  19600274 19600443
14:     chr1  19600274 19600443
15:     chr1  19600274 19600443
16:     chr1  19600274 19600443
17:    chr10   5482730  5482899
18:    chr10   5482730  5482899
19:    chr10   5482730  5482899
20:    chr10   5482730  5482899
21:    chr10   5482730  5482899
22:    chr10   5482730  5482899
23:    chr10   5482730  5482899
24:    chr10   5482730  5482899
25:    chr10   5482730  5482899
26:    chr10   5482730  5482899
27:    chr10   5482730  5482899
28:    chr10   5482730  5482899
29:    chr10   5482730  5482899
30:    chr10   5482730  5482899
31:    chr10   5482730  5482899
32:    chr10   5482730  5482899
    seqnames hap_start  hap_end
                                                                                                                                                                      sequence
 1: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 2: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 3: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAAGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 4: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAGGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 5: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 6: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 7: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 8: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
 9: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
10: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAGGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
11: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAAGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
12: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAGGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
13: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACCGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
14: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATTCGGAGCCCCAAGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
15: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAGGCCGCCCAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
16: CCAGGGAGAGCGGGGCAGCGCACGGCTGGTGACAGGCTCAGCTCTGCACGGTGCAGAGGGAGGATCAGGGGCCACTGTTACCTCTGCAGTCGCTGCTGCCCATCCGGAGCCCCAAGCCGCCAAGGATGGTCTCGGACTGGCCGTCGCTGTACAGGAAGGCCGTGTCTATC
17: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
18: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
19: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAATGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
20: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAGTGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
21: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
22: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
23: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
24: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
25: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
26: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAGTGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
27: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAATGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
28: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAGTGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
29: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTGTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
30: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTTACTAGTGAATGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
31: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAGTGAATCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
32: TGCCCCTCTCTTGTTCATTTGCCACCACTACTATGTAAGGCATTGCCTCACTACTCTGTCTCTCATCCCTATTCATACCTGTAGTCCTTATCACCATTTAACTAGTGAATGAGTCTGTTTCAGAGTCCCATCCTGTGTTTCTGATCTCTAGGACCAATCTCAGTGTCCAT
                                                                                                                                                                      sequence

你知道如何在R中做到这一点吗?

谢啦


共有2个答案

汪臻
2023-03-14

下面是一个使用简化示例输入数据的选项。很多步骤都太详细了,试着一点一点地运行它来理解每个函数。

基本上,我在refseq上拆分,然后将refseq拆分为字母,然后用alt更新位置组合,然后粘贴回一起。

df <- data.frame(alt = c("A","B", "C", "D", "E"),
                 snp_position = c(2, 4, 1, 3, 5),
                 refseq = c("-----", "-----",
                            "=====", "=====", "====="),
                 stringsAsFactors = FALSE)

res <- stack(
  lapply(split(df, df$refseq), function(i){
    s <- strsplit(i$refseq[ 1 ], "")[[ 1 ]]
    pos <- i$snp_position
    alt <- i$alt
    unlist(
      lapply(seq(nrow(i)), function(j) 
        combn(pos, j,
              FUN = function(k){
                res <- s
                res[ k ] <- alt[ which(pos %in% k) ]
                paste(res, collapse = "")
              }))
    )
  }))
colnames(res) <- c("sequence", "refseq")

res
#      sequence refseq
#   1     -A---  -----
#   2     ---B-  -----
#   3     -A-B-  -----
#   4     C====  =====
#   5     ==D==  =====
#   6     ====E  =====
#   7     C=D==  =====
#   8     C===E  =====
#   9     ==D=E  =====
#   10    C=D=E  =====
王建华
2023-03-14

我只使用了前两行和感兴趣的列。我还缩小了refseq的大小,并使用snp_位置=2来更好地可视化。但是,只要列名相同,它就会对您的完整数据起同样的作用。

输入:

df <- data.frame(alt=c("G","A"),snp_position=c(2,2),refseq=c("CCAGGGAGAGCGGGGCAGCGC",
                                                                "TCAGGGAGAGCGGGGCAGCGC"))

功能:

  seq_mut <- function(refseq,snp_position,Mutated_allele){
  
  combinations <- c("A","T","C","G")
  WT_allele <- substr(refseq, snp_position, snp_position)
  combinations <- combinations[combinations!=WT_allele]
  
  refseq2 <- refseq
  substr(refseq2, snp_position, snp_position) <- Mutated_allele
  combinations <- combinations[combinations!=Mutated_allele]
  
  seq <- refseq2
  for(variant in combinations){
    refseq2 <- refseq
    substr(refseq2, snp_position, snp_position) <- variant
    seq <- paste0(seq,";",refseq2)
  }
  
  return(seq)
}

使用dplyr:

library(dplyr)
df <- df %>% rowwise() %>% mutate(ref_seq_mut=seq_mut(refseq,snp_position,alt))

三年:

library(tidyr)
df <- df %>% separate(ref_seq_mut,into=c("V1","V2","V3"))%>% 
  pivot_longer(cols=refseq:V3,values_to="refseq") %>% select(-name)

输出:

alt   snp_position refseq               
  <chr>        <dbl> <chr>                
1 G                2 CCAGGGAGAGCGGGGCAGCGC
2 G                2 CGAGGGAGAGCGGGGCAGCGC
3 G                2 CAAGGGAGAGCGGGGCAGCGC
4 G                2 CTAGGGAGAGCGGGGCAGCGC
5 A                2 TCAGGGAGAGCGGGGCAGCGC
6 A                2 TAAGGGAGAGCGGGGCAGCGC
7 A                2 TTAGGGAGAGCGGGGCAGCGC
8 A                2 TGAGGGAGAGCGGGGCAGCGC
 类似资料:
  • 问题内容: 我正在读csv到: 我想用空字符串替换该行的第8个元素中的所有匹配项。该功能不起作用。 有一个更好的方法吗? 问题答案: 问题是您对的结果不做任何事情。在Python中,字符串是不可变的,因此任何处理字符串的操作都将返回新字符串,而不是修改原始字符串。

  • 我有一个包含以下列的表: 然后,我手动将更新sql编写为 现在,这个解决方案对我来说并不现实。我查看了以下与Regex相关的链接和它周围的其他链接。 更新和替换字符串的一部分 https://www.codeproject.com/questions/456246/replace-special-characters-in-sql 我如何编写能够处理所有这些特殊字符的更新sql?

  • 问题内容: 我有一些带有以下格式的方程式的字符串。 我还有一个文本文件,其中包含每个变量的名称,例如: 等等… 什么是对我来说,写代码的最佳方式,使其在插头到处发生,并且对等? 问题答案: 对于string ,请使用以下函数:

  • 我希望我的程序替换输入字符串中的每个元音。

  • 我有一个字符串“ECET”,我想创建所有可能的字符串,其中我用“X”替换一个或多个字母(除第一个外)。 在这种情况下,我的结果是: 关于如何处理这个问题有什么想法吗? 这不仅仅是创建“X”的可能组合/排列,还包括如何将它们与现有字符串组合。