我有一份谷歌学者论文的作者名单:Zoe Pikramenou、James H.R.Tucker、Alison Rodger、Timothy Dafforn
。我想摘录并打印至少3篇论文的标题。
您可以使用学术词典从每位作者处获取论文信息:
from scholarly import scholarly
AuthorList = ['Zoe Pikramenou', 'James H. R. Tucker', 'Alison Rodger', 'Timothy Dafforn']
for Author in AuthorList:
search_query = scholarly.search_author(Author)
author = next(search_query).fill()
print(author)
输出看起来像(只是从一个作者那里得到的一小部分摘录)
{'bib': {'cites': '69',
'title': 'Chalearn looking at people and faces of the world: Face '
'analysis workshop and challenge 2016',
'year': '2016'},
'filled': False,
'id_citations': 'ZhUEBpsAAAAJ:_FxGoFyzp5QC',
'source': 'citations'},
{'bib': {'cites': '21',
'title': 'The NoXi database: multimodal recordings of mediated '
'novice-expert interactions',
'year': '2017'},
'filled': False,
'id_citations': 'ZhUEBpsAAAAJ:0EnyYjriUFMC',
'source': 'citations'},
{'bib': {'cites': '11',
'title': 'Automatic habitat classification using image analysis and '
'random forest',
'year': '2014'},
'filled': False,
'id_citations': 'ZhUEBpsAAAAJ:qjMakFHDy7sC',
'source': 'citations'},
{'bib': {'cites': '10',
'title': 'AutoRoot: open-source software employing a novel image '
'analysis approach to support fully-automated plant '
'phenotyping',
'year': '2017'},
'filled': False,
'id_citations': 'ZhUEBpsAAAAJ:hqOjcs7Dif8C',
'source': 'citations'}
我如何收集四个作者中有三个或更多的论文的bib
,特别是title
?
编辑:事实上,有人指出,id\u引文
对每一篇论文都不是唯一的,我错了。最好只使用标题
本身
首先,让我们将其转换为更友好的格式。您说,id\u引文
对于每篇论文都是唯一的,因此我们将使用它作为哈希表/dict键。
然后,我们可以将每个id_citation
映射到它所显示的bib判决和作者,作为元组(bib,author_name)
的列表。
author_list = ['Zoe Pikramenou', 'James H. R. Tucker', 'Alison Rodger', 'Timothy Dafforn']
bibs = {}
for author_name in author_list:
search_query = scholarly.search_author(author_name)
for bib in search_query:
bib = bib.fill()
bibs.setdefault(bib['id_citations'], []).append((bib, author_name))
此后,我们可以根据附加的作者数量对bibs
中的键进行排序:
most_cited = sorted(bibs.items(), key=lambda k: len(k[1]))
# most_cited is now a list of tuples (key, value)
# which maps to (id_citation, [(bib1, author1), (bib2, author2), ...])
和/或将该列表筛选为只有三个或更多外观的引用:
cited_enough = [tup[1][0][0] for tup in most_cited if len(tup[1]) >= 3]
# using key [0] in the middle is arbitrary. It can be anything in the
# list, provided the bib objects are identical, but index 0 is guaranteed
# to be there.
# otherwise, the first index is to grab the list rather than the id_citation,
# and the last index is to grab the bib, rather than the author_name
现在我们可以从那里检索论文的标题:
paper_titles = [bib['bib']['title'] for bib in cited_enough]
扩展我的评论,您可以使用Pandas groupby实现这一点:
import pandas as pd
from scholarly import scholarly
AuthorList = ['Zoe Pikramenou', 'James H. R. Tucker', 'Alison Rodger', 'Timothy Dafforn']
frames = []
for Author in AuthorList:
search_query = scholarly.search_author(Author)
author = next(search_query).fill()
# creating DataFrame with authors
df = pd.DataFrame([x.__dict__ for x in author.publications])
df['author'] = Author
frames.append(df.copy())
# joining all author DataFrames
df = pd.concat(frames, axis=0)
# taking bib dict into separate columns
df[['title', 'cites', 'year']] = pd.DataFrame(df.bib.to_list())
# counting unique authors attached to each title
n_authors = df.groupby('title').author.nunique()
# locating the unique titles for all publications with n_authors >= 2
output = n_authors[n_authors >= 2].index
这发现了202篇论文,其中有2位或更多的作者在该列表中(在774篇论文中)。以下是一个输出示例:
Index(['1, 1′-Homodisubstituted ferrocenes containing adenine and thymine nucleobases: synthesis, electrochemistry, and formation of H-bonded arrays',
'722: Iron chelation by biopolymers for an anti-cancer therapy; binding up the'ferrotoxicity'in the colon',
'A Luminescent One-Dimensional Copper (I) Polymer',
'A Unidirectional Energy Transfer Cascade Process in a Ruthenium Junction Self-Assembled by r-and-Cyclodextrins',
'A Zinc(II)-Cyclen Complex Attached to an Anthraquinone Moiety that Acts as a Redox-Active Nucleobase Receptor in Aqueous Solution',
'A ditopic ferrocene receptor for anions and cations that functions as a chromogenic molecular switch',
'A ferrocene nucleic acid oligomer as an organometallic structural mimic of DNA',
'A heterodifunctionalised ferrocene derivative that self-assembles in solution through complementary hydrogen-bonding interactions',
'A locking X-ray window shutter and collimator coupling to comply with the new Health and Safety at Work Act',
'A luminescent europium hairpin for DNA photosensing in the visible, based on trimetallic bis-intercalators',
...
'Up-Conversion Device Based on Quantum Dots With High-Conversion Efficiency Over 6%',
'Vectorial Control of Energy‐Transfer Processes in Metallocyclodextrin Heterometallic Assemblies',
'Verteporfin selectively kills hypoxic glioma cells through iron-binding and increased production of reactive oxygen species',
'Vibrational Absorption from Oxygen-Hydrogen (Oi-H2) Complexes in Hydrogenated CZ Silicon',
'Virginia review of sociology',
'Wildlife use of log landings in the White Mountain National Forest',
'Yttrium 1995',
'ZUSCHRIFTEN-Redox-Switched Control of Binding Strength in Hydrogen-Bonded Metallocene Complexes Stichworter: Carbonsauren. Elektrochemie. Metallocene. Redoxchemie …',
'[2] Rotaxanes comprising a macrocylic Hamilton receptor obtained using active template synthesis: synthesis and guest complexation',
'pH-controlled delivery of luminescent europium coated nanoparticles into platelets'],
dtype='object', name='title', length=202)
由于所有数据都在Pandas中,因此您还可以探索每篇论文的附加作者以及您在作者中可以访问的所有其他信息。出版物
array来自学术界。
问题内容: Python中有没有一种方法可以在不显式创建匹配对象的情况下访问匹配组(或美化以下示例的另一种方法)? 这是一个示例,以阐明我对此问题的动机: 遵循Perl代码 翻译成Python 看起来很尴尬(如果为其他级联,则匹配对象创建)。 问题答案: 您可以创建一个小类,该类返回调用match的布尔结果, 并 保留匹配的组以供后续检索: Python 3 print作为函数的更新,以及Pyth
我和ElasticSearch一起工作。当我执行此查询时: 我得到了我想要的(所有的结果,其中有参考黑莓,但不是Q10)。 但是,我想限制搜索的字段只限于“title”字段。例如,_source文档有标题、正文、标签等,我只想搜索标题。ElasticSearch“匹配”似乎很适合我... 虽然这只成功地搜索了标题,但它仍然返回标题中带有Q10的结果,这与上面的搜索不同。 我正在看比赛文档,但似乎不
我希望搜索具有类似于查询字符串的< code>firstField(映射为文本)或等于1的< code>secondField(映射为整数)的项目。 阅读文档后,我明白我应该使用和(如下所示): 然而,结果表明,对结果的贡献远远低于,因为它们的评分算法产生不同的量表。 我正在考虑人为地ing 。
给定一个带有术语过滤器的过滤查询,有没有办法确定过滤器中的哪些术语与没有文档匹配?换句话说,确定过滤器中的冗余术语? 或者,是否可以指定一个聚合来计算过滤器术语的匹配项,而不是匹配文档中的唯一术语?
问题内容: 找出查询中哪些术语与以lucene命中形式返回的给定文档相匹配的最佳方法是什么? 我尝试了一种奇怪的方法,其中涉及在Lucene Contrib中命中高亮显示包,还有一种方法针对最上面的文档(“ docId:xy AND description:each_word_in_query”)在查询中搜索每个单词。 没有得到满意的结果?点击突出显示不会报告与第一个文档不同的某些单词。我不确定第
如何反序列化Java 8时间API类? 以下是我所做的: 包含的依赖项: 请求体要反序列化为: Spring控制器: JSON 请求正文: 例外情况: