LBSN

Location-Based Social Network Data Generation

The detailed investigation of collective social phenomena requires an inordinate amount of data given that the patterns underlying human behavior can be quite complex and are as such hard to predict. This is summed up quite aptly by the late Nobel laureate Murray Gell-Mann: “Think how hard physics would be if particles could think.” To some extent, research on location-based social networks (LBSNs) attempts to grapple this challenge by leveraging social network data for predictive tasks such as Point-of-Interest recommendation [10][11][12], social link prediction [9], and location prediction [1]. The major challenge, however, is that comprehensive real-world LBSN data sets are hardly available due to privacy implications. Also, because such data are considered operational data, businesses are unwilling to make them publicly available or share them. The largest publicly available LBSN data set is the Gowalla data set [1] having 36M check-ins. But, after removing users with less than 15 check-ins, and removing locations with less than 10 visitors, from Gowalla, only 18.7k users and 1.29M check-ins remain [8]. Distributed over 20 months, that is an average of only 2.1k checkins per day. Distributed globally this leaves only a hand-full of check-ins per day per city, hardly enough to model, explain and predict mobility. A recent study by Li et al. [7] concludes that "Researchers working with LBSN data sets are often confronted by themselves or others with doubts regarding the quality or the potential of their data sets" and that "it is reasonable to be skeptical" [7].

Towards addressing this challenge, we have developed an agent-based simulation framework that exhibits realistic social behavior based on the data about real-world phenomena and social science theories. We simulate plausible numbers of agents over years of simulation time and potentially generate LBSN data for entire generations. The simulation generates high-fidelity LBSN data sets containing complete location and temporal social network data without uncertainty collected over long periods of time. We have published this agent-based simulation framework, ran many simulations, and made available many such data sets [5]. LSBN is certainly important but one of the many potential uses of our data sets. Given the current pandemic crisis, there is a potential to use our framework for disease spread simulations.

Figure 1. Environments Populated with Agents. Clockwise fromTop Left: GMU, NOLA, Large and Small Synthetic Villages

The four maps in Figure 1 are (i) New Orleans, LA (NOLA), (ii) the George Mason University (GMU) campus, (iii) a small synthetic town (TownS) and (iv) a large synthetic town (TownL). The synthetic maps were created using a spatial network and place generator based in a generative grammar similar to L-systems described in [6]. A more detailed description of these data sets can be found in [5].

There were multiple settings that were used with each of the respective study areas. Table 1 specifies the run-time settings in detail: the area simulated, the type and number of sites simulated, the number of neighborhoods, and the count of agents. In terms of the number of sites, we simulated five types of sites: Schools, Pubs, Workplaces, Restaurants, and Apartments. The actual number of each respective site type and the number of neighborhoods simulated is the result of an internal computational process indirectly derived from the user choice of parameters but not directly accessible to the user at setup time.

Table 1. Location-Based Social Network Simulation Settings.

Settings	Maps	Area (km2)	# of Sites						# of Neighborhoods	# of Agents
Settings	Maps	Area (km2)	Total	School	Recreation	Workplace	Restaurant	Apt.	# of Neighborhoods	# of Agents
GMU-1K	GMU	3.36	1,781	1	10	250	20	1,500	1	874
GMU-3K	GMU	3.36	5,341	1	30	750	60	4,500	1	2,589
GMU-5K	GMU	3.36	8,901	1	50	1250	100	7,500	1	4,648
NOLA-1K	NOLA	6.49	1,781	2	10	250	20	1,500	2	863
NOLA-3K	NOLA	6.49	5,342	2	30	750	60	4,500	2	2,720
NOLA-5K	NOLA	6.49	8,904	4	50	1,250	100	7,500	2	4,728
TownS-1K	TownSm	58.41	1,788	4	12	252	20	1,500	4	876
TownS-3K	TownSm	58.41	5,348	4	32	752	60	4,500	4	2,645
TownS-5K	TownSm	58.41	8,908	4	52	1,252	100	7,500	4	4,349
TownL-1K	TownLg	126.2	1,789	6	12	253	18	1,500	6	853
TownL-3K	TownLg	126.2	5,346	6	30	750	60	4,500	6	2,550
TownL-5K	TownLg	126.2	8,904	6	48	1,248	102	7,500	6	4,216

To demonstrate the feasibility of generating LBSN data over time, Table 2 gives an overview of the generated output data from the location-based social network simulation [5]. All data sets are available at the Open Science Framework (OSF) repository. Each data set can be downloaded directly. For low bandwidth connections, a pre-compiled executable of each simulation can be downloaded to re-generated the data locally.

The table shows the number of agent check-ins and the number of social links attributed to each of scenarios that consist of different numbers (1K, 3K, and 5K) of agents and four different maps. We note that the number of social links may be larger than the square of the number of users. That's due to the temporal nature of the data set: Social links may emerge and break over time. Agents meet new friends, but slowly forget about them if their friendship is not reinforced with further meetings. Thus, each link comes with a start time-stamp and end time-stamp.

We observe that the number of check-ins increases, for all study areas, linear with the number of agents. This is plausible, as the number of hours per day that agents can spend to satisfy their needs and visit sites is independent of other agents. However, we do see that the number of social links increase super-linear in the number of agents. This can be explained by more agents leading to larger co-locations of agents, creating chances for each pair of agents in the same co-location to become friends. We note that the generated temporal social network may have more edges than we have agent pairs. This is due to the temporal nature of the network. It reports changes over time and as such a single pair of agents can have multiple friends and unfriend events. The number reported corresponds to the number of new edges added to the temporal social network, regardless of the duration of these events. The super-linear growth of the social network also explains the super-linear run-time to create each data set, ranging from less to one hour for the 1000 agent instances to 10.5 hours for 5000 agents. Besides (i) number of check-ins and (ii) social links, we also report (iii) the run-time of each simulation and (iv) the resulting data size in Table 2.

Table 2. Data Sets Resulting from Location-Based Social Network Simulation

Settings	# of Users	# of Check-Ins	# of Links	Period (month)
GMU-1K	874	2,082,788	9,114,337	15
GMU-1K	874	16,210,909	75,747,439	121
GMU-3K	2,589	6,229,293	27,650,685	15
GMU-5K	4,648	11,189,377	54,250,961	15
NOLA-1K	863	2,099,867	9,160,459	15
NOLA-1K	863	29,597,885	141,425,945	221
NOLA-3K	2,720	6,886,573	27,284,999	15
NOLA-5K	4,728	12,007,415	48,710,881	15
TownS-1K	876	2,101,620	7,643,374	15
TownS-3K	2,645	6,454,785	26,364,057	15
TownS-5K	4,349	10,760,008	45,118,825	15
TownL-1K	853	2,030,688	6,418,473	15
TownL-3K	2,550	6,340,360	22,655,915	15
TownL-5K	4,216	10,548,956	40,431,579	15

Figure 2 shows four visualizations of the social networks of 1K agents exemplary for GMU, NOLA, TownS, and TownL at the end of the 15 months simulation. These visuals show different types of network structures, such as two to three large social communities for the synthetic TownL, and one large community for GMU and NOLA.

(a) GMU

(b) NOLA

(d) TownL

Figure 2. Social network (Note: the location of a node does not represent the location in the spatial network)

Since it is hard to describe the evolution of a social network over time, we have created a video for each of the four spatial areas showing the social network evolution over the 129,600 steps within the 15 months simulation time. These videos show how the social networks evolve from small isolated cliques into a large and complex network showing different sub-structures.

Reference

[1] E. Cho, S. A. Myers, and J. Leskovec. Friendship and Mobility: User Movement in Location-based Social Networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1082–1090, 2011

[2] H. Kavak, J.-S. Kim, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle. Location-Based Social Simulation. In SSTD, pages 218–221, 2019.

[3] J.-S. Kim, H. Kavak, U. Manzoor, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle. Simulating urban patterns of life: A geo-social data generation framework. In SIGSPATIAL, pages 576–579, 2019.

[4] J.-S. Kim, H. Jin, H. Kavak, O. C. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Zufle. LBSN-data. https://osf.io/e24th (accessed 2020-05-21)

[5] J.-S. Kim, H. Jin, H. Kavak, O. C. Rouly, A. Crooks, D. Pfoser, C. Wenk, and A. Züfle. Location-based Social Network Data Generation Based on Patterns of Life. In IEEE International Conference on Mobile Data Management (MDM’20) (to appear). IEEE, 2020.

[6] J.-S. Kim, H. Kavak, and A. Crooks. Procedural City Generation Beyond Game Development. SIGSPATIAL Special,10(2):34–41, 2018.

[7] M. Li, R. Westerholt, H. Fan, and A. Zipf. Assessing Spatiotemporal Predictability of LBSN: A Case Study of Three Foursquare Datasets. GeoInformatica, 22(3):541–561, 2018.

[8] Y. Liu, T.-A. N. Pham, G. Cong, and Q. Yuan. An Experimental Evaluation of Point-of-interest Recommendation in Location-based Social Networks. Proceedings of the VLDB Endowment, 10(10):1010–1021, 2017

[9] S. Scellato, A. Noulas, and C. Mascolo. Exploiting Place Features in Link Prediction on Location-based Social Networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1046–1054, 2011.

[10] H. Wang, M. Terrovitis, and N. Mamoulis. Location recommendation in location-based social networks using user check-in data. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 374–383, 2013.

[11] M. Ye, P. Yin, and W.-C. Lee. Location Recommendation for Location-based Social Networks. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 458–461, 2010.

[12] J.-D. Zhang and C.-Y. Chow. Point-of-interest Recommendations in Location-based Social Networks. SIGSPATIAL Special, 7(3):26–33, 2016.