NLP:LSTM之父眼中的深度学习十年简史《The 2010s: Our Decade of Deep Learning / Outlook on the 2020s》的参考文献
目录
The 2010s: Our Decade of Deep Learning / Outlook on the 2020s
References Beyond Those in Reference [MIR]
Selected References from Reference [MIR]
[MIR] J. Schmidhuber (2019). Deep Learning: Our Miraculous Year 1990-1991. Containing most references cited above. For convenience also appended below. Compare reddit posts [R2-R8] influenced by ref [MIR] (although my name is frequently misspelled).
[BW] H. Bourlard, C. J. Wellekens (1989). Links between Markov models and multilayer perceptrons. NIPS 1989, p. 502-510.
[BRI] Bridle, J.S. (1990). Alpha-Nets: A Recurrent "Neural" Network Architecture with a Hidden Markov Model Interpretation, Speech Communication, vol. 9, no. 1, pp. 83-92.
[BOU] H Bourlard, N Morgan (1993). Connectionist speech recognition. Kluwer, 1993.
[HYB12] Hinton, G. E., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82-97.
[LSTM14] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI 07, p. 774-779, Hyderabad, India, 2007 (talk).PDF.
[LSTM15] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, p 545-552, Vancouver, MIT Press, 2009. PDF.
[LSTM16] M. Stollenga, W. Byeon, M. Liwicki, J. Schmidhuber. Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation. Advances in Neural Information Processing Systems (NIPS), 2015. Preprint: arxiv:1506.07452.
[TR1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin (2017). Attention is all you need. NIPS 2017, pp. 5998-6008.
[TR2] J. Devlin, M. W. Chang, K. Lee, K. Toutanova (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805.
[SLG] S. Le Grand. Medium (2019). TLDR: Schmidhuber's Lab did it first. Link.
[AC18] Y. Burda, H. Edwards, D. Pathak, A. Storkey, T. Darrell, and A. A. Efros. Large-scale study of curiosity-driven learning. Preprint arXiv:1808.04355, 2018.
[T94] G. Tesauro. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation 6:2, p 215-219, 1994.
[DM4] Mastering the game of Go with deep neural networks and tree search. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Nature 529:7587, p 484-489, 2016.
[CAR1] Prof. Schmidhuber's highlights of robot car history (2007, updated 2011). Link.
[NAT2] D. Heaven. Why deep-learning AIs are so easy to fool. Nature 574, 163-166 (2019). Link. ["A baby doesn't learn by downloading data from Facebook," says Schmidhuber.]
[SV1] S. Zuboff (2019). The age of surveillance capitalism. The Fight for a Human Future at the New Frontier of Power. NY: PublicAffairs.
[SV2] Facial recognition changes China. Twitter discussion@hardmaru
[META10] T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.
[META17] R. Miikkulainen, Q. Le, K. Stanley, C. Fernando. NIPS 2017 Metalearning Symposium.
[LIP1] M. Wand, J. Koutnik, J. Schmidhuber. Lipreading with Long Short-Term Memory. Proc. ICASSP, p 6115-6119, 2016.
[DR16] A Giusti, J Guzzi, DC Ciresan, F He, JP Rodriguez, F Fontana, M Faessler, C Forster, J Schmidhuber, G Di Caro, D Scaramuzza, LM Gambardella (2016): First drone with onboard vision based on deep neural nets learns to navigate in the forest. Youtube video (Feb 2016).
[DNC2] R. Csordas, J. Schmidhuber. Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control. International Conference on Learning Representations (ICLR 2019). PDF.
[UDRL] Upside Down Reinforcement Learning (2019). Google it.
[K96] Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 237-285.
[H13] M. Hausknecht, J. Lehman, R. Miikkulainen, P. Stone. A Neuroevolution Approach to General Atari Game Playing. IEEE Transactions on Computational Intelligence and AI in Games, 16 Dec. 2013.
[LOC] S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. PDF. Neural Computation 11(3): 679-714, 1999
[OBJ1] Greff, K., Rasmus, A., Berglund, M., Hao, T., Valpola, H., Schmidhuber, J. (2016). Tagger: Deep unsupervised perceptual grouping. NIPS 2016, pp. 4484-4492.
[OBJ2] Greff, K., Van Steenkiste, S., Schmidhuber, J. (2017). Neural expectation maximization. NIPS 2017, pp. 6691-6701.
[OBJ3] van Steenkiste, S., Chang, M., Greff, K., Schmidhuber, J. (2018). Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. ICLR 2018.
[IG] X Chen, Y Duan, R Houthooft, J Schulman, I Sutskever, P Abbeel (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. NIPS 2016, pp. 2172-2180.
[WAV1] van Steenkiste, S., Koutnik, J., Driessens, K., Schmidhuber, J. (July 2016). A wavelet-based encoding for neuroevolution. GECCO 2016, pp. 517-524.
[OAI3] Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I. (2017). Evolution strategies as a scalable alternative to reinforcement learning. Preprint arXiv:1703.03864.
[MIR]-related discussions (2019) with many comments at reddit/ml (the largest machine learning forum with over 800k subscribers), ranked by votes (my name is often misspelled):
[R2] Reddit/ML, 2019. JS really had GANs in 1990. Link.
[R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. Link.
[R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by JS. Link.
[R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. Link.
[R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in JS' team, won 4 image recognition challenges prior to AlexNet. Link.
[R7] Reddit/ML, 2019. JS on Seppo Linnainmaa, inventor of backpropagation in 1970. Link.
[R8] Reddit/ML, 2019. JS on Alexey Ivakhnenko, godfather of deep learning 1965. Link.
Below a few selected interviews of the 2010s in newspapers and magazines (use DeepL or Google Translate (Sec. 1) to translate German texts). Hundreds of additional interviews and news articles (mostly in English or German) can be found through search engines.
[ACM16] ACM interview by S. Ibaraki (2016). Chat with J. Schmidhuber: Artificial Intelligence & Deep Learning - Now & Future.Link.
[INV16] J. Carmichael. AI gained consciousness in 1991... J. Schmidhuber is convinced the ultimate breakthrough already happened. Inverse, Dec 2016. Link.
[SR18] JS interviewed by Swiss Re (2018): The intelligence behind artificial intelligence. Link.
[CNNTV2] JS interviewed by CNNmoney (2019): Part 2 on a healthcare data market where every patient can become a micro-entrepreneur. (Part 1 is more general.)
[FA15] Intelligente Roboter werden vom Leben fasziniert sein. (Intelligent robots will be fascinated by life.) FAZ, 1 Dec 2015. Link.
[SP16] JS interviewed by C. Stoecker: KI wird das All erobern. (AI will conquer the universe.) SPIEGEL, 6 Feb 2016. Link.
[FA18] KI ist eine Riesenchance für Deutschland. (AI is a huge chance for Germany.) FAZ, 2018. Link.
[SPE17] JS interviewed by P. Hummel: Ein Wettrüsten wird sich nicht verhindern lassen. (An AI arms race is inevitable.) Spektrum, 28 Aug 2017. Link.
[CAR2] Interview with J. Schmidhuber at the Geneva Motor Show 2019: KI wird die Autobranche revolutionieren. (AI will revolutionise the car industry.) Blick, 11/03/2019. Link.
[FATV] AI & Economy. Public Night Talk with J. Schmidhuber, organised by FAZ and Hertie Stiftung (2019, in German). Youtube link.
[DL1] J. Schmidhuber, 2015. Deep Learning in neural networks: An overview. Neural Networks, 61, 85-117. More.
[DL2] J. Schmidhuber, 2015. Deep Learning. Scholarpedia, 10(11):32832.
[DL4] J. Schmidhuber, 2017. Our impact on the world's most valuable public companies: 1. Apple, 2. Alphabet (Google), 3. Microsoft, 4. Facebook, 5. Amazon ... HTML.
[DLC] J. Schmidhuber, 2015. Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436). June 2015. HTML.
[AV1] A. Vance. Google Amazon and Facebook Owe Jürgen Schmidhuber a Fortune - This Man Is the Godfather the AI Community Wants to Forget. Business Week, Bloomberg, May 15, 2018.
[KO0] J. Schmidhuber. Discovering problem solutions with low Kolmogorov complexity and high generalization capability. Technical Report FKI-194-94, Fakultät für Informatik, Technische Universität München, 1994. PDF. Also at ICML'95.
[KO2] J. Schmidhuber. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857-873, 1997. PDF.
[CO1] J. Koutnik, F. Gomez, J. Schmidhuber (2010). Evolving Neural Networks in Compressed Weight Space. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2010), Portland, 2010. PDF.
[CO2] J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, July 2013. PDF.
[CO3] R. K. Srivastava, J. Schmidhuber, F. Gomez. Generalized Compressed Network Search. Proc. GECCO 2012. PDF.
[DM1] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. Tech Report, 19 Dec. 2013, arxiv:1312.5602.
[DM2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis. Human-level control through deep reinforcement learning. Nature, vol. 518, p 1529, 26 Feb. 2015. Link.
[DM3] S. Stanford. DeepMind's AI, AlphaStar Showcases Significant Progress Towards AGI. Medium ML Memoirs, 2019. [Alphastar has a "deep LSTM core."]
[OAI1] G. Powell, J. Schneider, J. Tobin, W. Zaremba, A. Petron, M. Chociej, L. Weng, B. McGrew, S. Sidor, A. Ray, P. Welinder, R. Jozefowicz, M. Plappert, J. Pachocki, M. Andrychowicz, B. Baker. Learning Dexterity. OpenAI Blog, 2018.
[OAI2] OpenAI et al. (Dec 2019). Dota 2 with Large Scale Deep Reinforcement Learning. Preprint arxiv:1912.06680. [An LSTMcomposes 84% of the model's total parameter count.]
[OAI2a] J. Rodriguez. The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI. Towards Data Science, 2018. [An LSTM was the core of OpenAI Five.]
[MC43] W. S. McCulloch, W. Pitts. A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, Vol. 5, p. 115-133, 1943.
[K56] S.C. Kleene. Representation of Events in Nerve Nets and Finite Automata. Automata Studies, Editors: C.E. Shannon and J. McCarthy, Princeton University Press, p. 3-42, Princeton, N.J., 1956.
[VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J.S.) PDF. [More on the Fundamental Deep Learning Problem.]
[VAN3] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001. PDF.
[LSTM0] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory.TR FKI-207-95, TUM, August 1995. PDF.
[LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More.
[LSTM2] F. A. Gers, J. Schmidhuber, F. Cummins. Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10):2451-2471, 2000. PDF. [The "vanilla LSTM architecture" that everybody is using today, e.g., in Google's Tensorflow.]
[LSTM3] A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:5-6, pp. 602-610, 2005. PDF.
[LSTM4] S. Fernandez, A. Graves, J. Schmidhuber. An application of recurrent neural networks to discriminative keyword spotting. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007. PDF.
[LSTM5] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009. PDF.
[LSTM6] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver, MIT Press, 2009. PDF.
[LSTM7] J. Bayer, D. Wierstra, J. Togelius, J. Schmidhuber. Evolving memory cell structures for sequence learning. Proc. ICANN-09, Cyprus, 2009. PDF.
[LSTM8] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013. PDF.
[LSTM9] O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, G. Hinton. Grammar as a Foreign Language. Preprint arXiv:1412.7449 [cs.CL].
[LSTM10] A. Graves, D. Eck and N. Beringer, J. Schmidhuber. Biologically Plausible Speech Recognition with LSTM Neural Nets. In J. Ijspeert (Ed.), First Intl. Workshop on Biologically Inspired Approaches to Advanced Information Technology, Bio-ADIT 2004, Lausanne, Switzerland, p. 175-184, 2004. PDF.
[LSTM11] N. Beringer and A. Graves and F. Schiel and J. Schmidhuber. Classifying unprompted speech by retraining LSTM Nets. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3696, pp. 575-581, Springer-Verlag Berlin Heidelberg, 2005.
[LSTM12] D. Wierstra, F. Gomez, J. Schmidhuber. Modeling systems with internal state using Evolino. In Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005. Got a GECCO best paper award.
[LSTM13] F. A. Gers and J. Schmidhuber. LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages. IEEE Transactions on Neural Networks 12(6):1333-1340, 2001. PDF.
[NAS] B. Zoph, Q. V. Le. Neural Architecture Search with Reinforcement Learning. Preprint arXiv:1611.01578 (PDF), 2017.
[S2S] I. Sutskever, O. Vinyals, Quoc V. Le. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NIPS), 2014, 3104-3112.
[CTC] A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 06, Pittsburgh, 2006. PDF.
[GSR15] Dramatic improvement of Google's speech recognition through LSTM: Alphr Technology, Jul 2015, or 9to5google, Jul 2015
[META1] J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Diploma thesis, Tech Univ. Munich, 1987. HTML.
[FASTMETA1] J. Schmidhuber. Steps towards `self-referential' learning. Technical Report CU-CS-627-92, Dept. of Comp. Sci., University of Colorado at Boulder, November 1992.
[FASTMETA2] J. Schmidhuber. A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer, 1993. PDF.
[FASTMETA3] J. Schmidhuber. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton, pages 191-195. IEE, 1993.
[FAST0] J. Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Technical Report FKI-147-91, Institut für Informatik, Technische Universität München, March 1991. PDF.
[FAST1] J. Schmidhuber. Learning to control fast-weight memories: An alternative to recurrent nets. Neural Computation, 4(1):131-139, 1992.PDF. HTML. Pictures (German).
[FAST2] J. Schmidhuber. Reducing the ratio between learning complexity and number of time-varying variables in fully recurrent nets. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 460-463. Springer, 1993. PDF.
[FAST3] I. Schlag, J. Schmidhuber. Gated Fast Weights for On-The-Fly Neural Program Generation. Workshop on Meta-Learning, @NIPS 2017, Long Beach, CA, USA.
[FAST3a] I. Schlag, J. Schmidhuber. Learning to Reason with Third Order Tensor Products. Advances in Neural Information Processing Systems (NIPS), Montreal, 2018. Preprint: arXiv:1811.12143. PDF.
[DNC] Hybrid computing using a neural network with dynamic external memory. A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A. P. Badia, K. M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, D. Hassabis. Nature, 538:7626, p 471, 2016.
[PDA1] G.Z. Sun, H.H. Chen, C.L. Giles, Y.C. Lee, D. Chen. Neural Networks with External Memory Stack that Learn Context - Free Grammars from Examples. Proceedings of the 1990 Conference on Information Science and Systems, Vol.II, pp. 649-653, Princeton University, Princeton, NJ, 1990.
[PDA2] M. Mozer, S. Das. A connectionist symbol manipulator that discovers the structure of context-free languages. Proc. NIPS 1993.
[WU] Y. Wu et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Preprint arXiv:1609.08144 (PDF), 2016.
[GT16] Google's dramatically improved Google Translate of 2016 is based on LSTM, e.g., WIRED, Sep 2016, or siliconANGLE, Sep 2016
[FB17] By 2017, Facebook used LSTM to handle over 4 billion automatic translations per day (The Verge, August 4, 2017); see alsoFacebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017)
[LSTM-RL] B. Bakker, F. Linaker, J. Schmidhuber. Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, 2002. PDF.
[HW1] Srivastava, R. K., Greff, K., Schmidhuber, J. Highway networks. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The first working very deep feedforward nets with over 100 layers. Let g, t, h, denote non-linear differentiable functions. Each non-input layer of a highway net computes g(x)x + t(x)h(x), where x is the data from the previous layer. (Like LSTM with forget gates [LSTM2] for RNNs.) Resnets [HW2] are a special case of this where g(x)=t(x)=const=1. More.
[HW2] He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. Preprint arXiv:1512.03385 (Dec 2015). Residual nets are a special case of highway nets [HW1], with g(x)=1 (a typical highway net initialization) and t(x)=1. More.
[HW3] K. Greff, R. K. Srivastava, J. Schmidhuber. Highway and Residual Networks learn Unrolled Iterative Estimation. Preprintarxiv:1612.07771 (2016). Also at ICLR 2017.
[JOU17] Jouppi et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Preprint arXiv:1704.04760
[CNN1] K. Fukushima: Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [More in Scholarpedia.]
[CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987.
[CNN2] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. PDF.