Wednesday, March 28, 2012

No errors. No replication, either...

I have one central publisher server and several remote merge pull boxes. On random remote servers, at random times, replication will fail. The data is inconsistent between the master publisher and the remote merge pull subscribers. (i.e., The subscribers have more recent data than the publisher.)

There are no error messages generated, and it's only for one article. The rest seem to update fine.

I am a complete noob when it comes to replication. How would I even start troubleshooting this? :S

Any assistance is appreciated. Thanks.

-D.http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=85908|||If it is a pull subscription, then does that not imply that the replication is one way? And then if the clients have more recent data it would indicate that data is being entered directly into the clients.|||Pootle_Flump,

Should I only be posting on one board?

Blindman,

The way I understand it is supposed to work is that data is entered on the remote subscribers, merged at the publisher, and then a snapshot pushes everything back down to the subscribers. The merges run throughout the day, and then snapshots are sent down at night, every couple of weeks.

I've checked the job logs on everything, and at the subscriber, I consistently see "No data needed to be merged." At the publisher, I do occasionally see an error like "The process could not bulk copy out of table 'contB83E8AA0AE6B4DBBBC147EAFA47D75F3'.", but the snapshot eventually succeeds, and the data matches more recently than that snapshot, with individual subscribers.

What to look at next? It's the "No data needed to be merged." that gets me...|||Pootle_Flump,

Should I only be posting on one board?Well - my preference is one at a time but I've got over that hang-up :) Some people consider it bad netiquette and will throw a wobbly about it. Some will avoid any cross-board posts.

The reason is simply that you can find yourself toiling away with a problem to find that others on another board have been privvy to info that would have helped you or,worse, you go down a blind alley that has already been covered elsewhere. Sorry to say as well that most users that cross post tend (IME) to have less invested in their threads and so are more likely to go AWOL too.

Anyway - I posted the link for our contributers to ensure that they can review what has gone on at SQLTeam before helping you out here. If it is resolved over there or there is a useful nugget of info then they can find out and not waste their time.
[/semi rant]

Ideally I would be grateful if you could keep the two threads in synch if any "break throughs" are made in either and also inform both if you manage to sort out your problem :) I think then everyone gets the best of both worlds.|||... and finally

I have not used merge replication so can't rather than won't help!|||Well - my preference is one at a time but I've got over that hang-up :) Some people consider it bad netiquette and will throw a wobbly about it. Some will avoid any cross-board posts.

...

Ideally I would be grateful if you could keep the two threads in synch if any "break throughs" are made in either and also inform both if you manage to sort out your problem :) I think then everyone gets the best of both worlds.

Gotcha. Makes total sense. I'll be more than happy to keep everyone on both boards in the loop.

Thanks.

-D.|||... and finally

I have not used merge replication so can't rather than won't help!

No sweat. Nobody is an expert on everything. ( I'm living proof of that! :D )

Thanks.

-D.|||I have one central publisher server and several remote merge pull boxes. On random remote servers, at random times, replication will fail. The data is inconsistent between the master publisher and the remote merge pull subscribers. (i.e., The subscribers have more recent data than the publisher.)

There are no error messages generated, and it's only for one article. The rest seem to update fine.

I am a complete noob when it comes to replication. How would I even start troubleshooting this? :S

Any assistance is appreciated. Thanks.

-D.

I'll give this a crack, only because we are in the process of rolling out merge replication to 60+ nodes (it's really ugly).

First:
- Which version of SQL Server (for publisher, distributor and subscriber(s))?
- What type of replication (snapshot, merge, or transactional)?
- What type of subscription (push or pull)?
- Are there horizontal filters on the articles?
- Are the publisher and the distributor on the same server?

I believe you are using Merge Replication (please correct me if I am wrong):

1. Look for conflicts (depending on version, right-click on the publication and select "View Conflicts")

- If there are conflicts, you will have to resolve each one individually

2. The way merge replication works (in a nutshell):

- There are three triggers on each article (table): insert, update, delete

- When DML changes data on the table, the appropriate trigger fires and places an entry into either msmerge_contents (insert, update) or msmerge_tombstone (delete). The data inserted into these tables includes the article nickname (an integer) and the rowguid of the updated row.

- When the merge agent next runs, it reads changes at the subscriber (ie, records in the appropriate msmerge_ table), then changes in the publisher. It tracks the history of changes using an integer value called generation.

If you are getting a message that no updates were made, then it means that no new entries were written (either at the publisher or the subscriber) into these msmerge_ tables.

OR POSSIBLY: if you are using horizontal filters on the articles, then it might mean that the merge agent has determined that the filter has excluded the particular subscriber from receiving updates.

I have tried merge replication (with dynamic horizontal filters) in both SQL 2000 and SQL 2005. I wholeheartedly recommend ditching SQL 2000 in favor of SQL 2005 (for both the publisher and the distributor). It's infinitely easier to set up publications and to manage replication (better reports, better conflict management, better alerts, just better).

You can also try this site for some of your questions:
http://www.replicationanswers.com/Default.asp

Be prepared for many long hours of reading to become comfortable with replication (esp Merge Replication). There are many quirks and features that may lead to less hair and less sleep.

Regards,

hmscott|||Greatly appreciated, hmscott.

- We are using both snapshot and merge replication. The snapshots appear to work fine. The merge is where the issue is.
- Both are pull subscriptions.
- There are no filters on any of the articles, either the snapshot or merge articles. They're different sets of articles.
- The publisher and distributor are indeed on the same server.

1. I'm not in the office, but I will look for conflicts when I get back in. I believe that conflict resolution was handled automatically in our configuration, if I recall correctly.

2. Let me see if I understand this correctly. When a change to a merge replicated table is made, a note is made in either the msmerge_contents table or the msmerge_tombstone table.

Since we're not using any filters on the articles, I'm wondering if the msmerge_contents or _tombstone tables are having issues, or maybe the triggers aren't working properly? Do either of those scenarios make sense?

Sadly, ditching 2k for 2k5 is not currently an option. Management prefers the "Even if it's broken, don't fix it! It may make it worse!" method of problem management.

I'll check the link out over the holiday. Thanks for the info.

And I'm bald already, so I should be good to go. :)

Thanks again for your help. I'll update again when I can find out more.

-D.|||I'll be in and out (mostly out) over the holiday.

I'm uncomfortable with the merge/snapshot thing. Maybe it's because I'm not understanding it correctly: is the same article involved in both merge and snapshot replication?

If so, I'm out of my league already; I didn't think that was possible.

But I'll look a bit further to see what I can find.

You might also consider having a look at sp_msmergedummyupdate (in SQL BOL). It might help "force" an update for replication.

Regards,

hmscott|||I'm uncomfortable with the merge/snapshot thing. Maybe it's because I'm not understanding it correctly: is the same article involved in both merge and snapshot repication?

If so, I'm out of my league already; I didn't think that was possible.

I don't think it is either. The article in question is only in the merge replication.

I'll definitely check out that sproc. Replication runs often enough during the day that it should be pretty easy to test.

Thanks again.

-D.|||Be prepared for many long hours of reading to become comfortable with replication (esp Merge Replication). There are many quirks and features that may lead to less hair and less sleep.

In this regard have you found a one-stop-shop for learning the minutiae of merge replication?
The best single resource I've found for transactional & snapshot replication has to be:
http://www.amazon.com/Guide-Server-Transactional-Snapshot-Replication/dp/0974973602/ref=sr_1_1/103-1393738-3011844|||In this regard have you found a one-stop-shop for learning the minutiae of merge replication?
The best single resource I've found for transactional & snapshot replication has to be:
http://www.amazon.com/Guide-Server-Transactional-Snapshot-Replication/dp/0974973602/ref=sr_1_1/103-1393738-3011844

You're right; I should have mentioned Mr. Cotter's book. As far as I am aware, Mr. Cotter is the expert on SQL replication.

Mostly, what I did was set up a Sandbox environment to test out various aspects of replication (mostly merge replication). I then repeated a number of tests and watched SQL Profiler (to see individual transactions) and the contents of various msmerge_ tables to see what transpired "under the hood". It was tedious but enlightening.

Regards,

hmscott|||From strange to stranger.

One of the developers on my team (my lead, actually) stated that the problem is fixed. In master..sysobjects, the replinfo column for that table object had been changed to 0 instead of 128, so the table was somehow not marked for replication. Nobody did this in EM or via script as far as we are aware, and the blame is currently being laid on a third party application. He marked that table for replication again, and it's working now.

Is that even possible? I thought ad-hoc updates to system tables weren't allowed. Unless the third-party app is running replication-altering sprocs, in which case, I'm scared.

I'm going to call this closed, for now. Thanks to everyone for their input. I learned a lot from this.

-D.|||From strange to stranger.

One of the developers on my team (my lead, actually) stated that the problem is fixed. In master..sysobjects, the replinfo column for that table object had been changed to 0 instead of 128, so the table was somehow not marked for replication. Nobody did this in EM or via script as far as we are aware, and the blame is currently being laid on a third party application. He marked that table for replication again, and it's working now.

Is that even possible? I thought ad-hoc updates to system tables weren't allowed. Unless the third-party app is running replication-altering sprocs, in which case, I'm scared.

I'm going to call this closed, for now. Thanks to everyone for their input. I learned a lot from this.

-D.

ad-hoc updates to system tables are allowed, but you have to run sp_configure 'allow updates' and set the value to 1. I don't generally recommend that you do this unless you know what values you are going to change and why.

You can try running profiler when the 3rd-party app is running to see what's going on under the hood. Warning: you may not like what you see.

Regards,

hmscott

No comments:

Post a Comment