Mobile edge computing (MEC) is a newly emerging concept that provides significant local computing power and reduces end-to-end latency. In MEC environments, caching frequently accessed services on edge servers effectively reduces latency and improves system responsiveness. An ongoing research topic in such a cachable MEC context is to design novel algorithms for yielding high-quality caching decision that guarantee high user-perceived quality-of-service (QoS) and high system responsiveness of delivery of cached content with the difference of caching capacities of edge servers and diversified content popularity appropriately addressed. In this article, we propose a multi-armed bandits learningbased method busing a Thompson sampling for generating caching decisions. We introduce a genetic multi-armed bandits algorithm (GMAB), which synthesizes the genetic algorithm (GA) and multi-armed bandits (MAB), for optimizing caching effectiveness with timing and space constraints. The experiment results show that GMAB outperforms traditional methods in terms of multiple aspects.